This article will explain step-by-step how to retrieve a remote web page using ASP, save it locally, and avert all kinds of disaster along the way. As a bonus, I’ll provide a demonstration of how to parse our saved file for the information we really want.
Contributed by Justin Cook Rating: / 48 May 10, 2004
Perusing the posts on the ASP Free forums, I've stumbled upon a number of similarly interesting requests. My first reaction to these requests was to reminisce, to think about hockey pools and fine dining. My second reaction was to answer the posts. My third, following the second by mere milliseconds, was the realization that the answer will take up a couple of pages, so why not write an article on the subject, so that all can benefit?
To clarify: a number of times now I've seen people asking the same question: how the heck can I use ASP (or even just VBS) to retrieve the contents of a web page and save it locally?
Well it so happens that at one time I was faced with that precise problem. The story goes like this: One day, I'm coding away happily, and one of my coworkers came to me in absolute distress! He was in charge of running the hockey pool at work, and a serious issue had arisen. Previously the website offering the hockey stats had provided a CSV file for download, but upon restructuring the site, had begun to provide the stats via a web page only. It was absolutely critical to have the stats in CSV format, and he promised me and my girlfriend (now wife) dinner at the fine restaurant of our choosing if I could deliver them.
Now your needs may pale in comparison to the importance of a hockey pool (that was sarcasm...), but I promise you that after reading, your inability to step down from a challenge will be rewarded by the tools and the know-how to accomplish said task. Oh, and I'll also show you how I parsed the file after saving it.
I have used this method quite successfully and efficiently in a publishing process built into my Content Management System, but once again, not nearly as important as hockey statistics.
Keep in mind please, that at the time web services were not available for use. Also, this was classic ASP that I was using, not .NET, with which you handle this task much differently.
Before We Proceed
This article will make use of the MSXML2.ServerXMLHTTP object as well as the Proxy Configuration Tool provided by Microsoft. Make sure you have both of these installed on the machine from which the script is running. If through ASP, they need to be installed on the server, and if through VBS, on your client machine.
As always, we'll begin at the beginning. Let us first declare all variables for use within the script, and instantiate some of the objects that we'll be using:
Dim objRead, objWrite, objShell, objXML, objFile, objFSO Dim location, thisFolder, skaters, strURL,
Set objFSO = Server.CreateObject("Scripting.FileSystemObject") Set objShell = Server.CreateObject("Wscript.Shell") Set objXML = Server.CreateObject("MSXML2.ServerXMLHTTP")
If you receive an error at this point to the effect of "CreateObject Failed", there's a very good chance that the XML Parser is not yet properly installed. This only leads me to believe that you skipped over the 'Before We Proceed' section. Now would be a great time to go back there, and install the necessary components before you proceed!
I just want to quickly relate how a near disaster was averted by the next bit of code. See, I initially designed the script at home, and it worked wonderfully. I emailed the script to my coworker without testing it in my work environment, making the same assumption that I'm sure many do, that if it works here it will work anywhere. But as Murphy's Law would dictate, it did not work for him. Not at all.
It was quickly deduced that our proxy server was preventing my wonderful invention from working. Beads of sweat were forming on my brow. The probability of enjoying that fine dinner suddenly seemed very... improbable.
After some digging, I found that Microsoft offers a little tool to deal with exactly this problem, called ProxyCfg.exe, or the Proxy Configuration Tool. As you've already installed it per my instructions previously, let's get to work using it!
'=== ASP dim strPath, objScript strPath = Server.MapPath( Request.ServerVariables("SCRIPT_NAME") ) Set objScript = objFSO.GetFile( strPath ) Set thisFolder = objScript.ParentFolder thisFolder = objFSO.GetFolder
It's that simple! However, there is much opportunity for error here. The reason is that I've just left the ProxyCfg.exe in the same folder as the script. If that folder is in any way protected or inaccessible to the script, the file will not be run, and an error generated. Make sure the ProxyCfg.exe is in an accessible folder!
Also, to save the hassle of manually typing in the proxy settings with the proper switches to the tool, we just give it a " -u". This tells it to find Internet Explorer's settings, and mimic them. Easy enough! Oh, and if you're doing this from a VBS file instead of ASP, you may want to include the next line to allow time to breathe whilst the ProxyCfg.exe runs.
WScript.Sleep( 4000 )
Now that we've ensured passage through the Proxy server, we can define the URL of the page we want to get, and go about getting it.
You should experience no errors here, because you've already installed the XML Parser, right? Ok, now that we have the page stored in a variable called skaters, let's save it locally.
If Not objFSO.FileExists("skaters.txt") Then objFSO.CreateTextFile("skaters.txt") Set objFile = objFSO.GetFile("skaters.txt") Set objWrite = objFile.OpenAsTextStream( 2,-2 ) 'Response.Write( skaters ) 'Response.End() objWrite.Write( skaters ) objWrite.Close() Set objFile = Nothing Set objWrite = Nothing
Here we've simply saved the info to a text file in the same directory. This is only for convenience. You can of course save it as a .htm file, if for example you want to see the saved page in a browser. You could also build a fancy directory structure to reflect the hierarchy of the published pages of a CMS. Or really, you can do whatever you wish to do with the captured file, I'm not going to stop you!
You may receive some crazy, buggy error here. It will almost invariably be with the objWrite.Write() line. Try un-commenting the two lines above that line, you should see the received page just fine. But look closely, do you see question marks in odd places? That has to do with the encoding of the page; it may have some non-English Unicode characters.
If this happens, the only hope is if you have some control over the content on the page. It must be scraped for these 'illegal' characters, and replaced with their ISO-5889 equivalents. Then and only then will you be able to work with it. I have created a function to do this, feel free to write me for it.
You may just need to save the file locally, and nothing more. If that's the case, feel free to skip right to the clean-up section. But most likely you'll need to do some work on the file now that we have it.
I'm going to show you how I read the file line-by-line, looking for indicators of specific columns of information. Looking back, I would for sure use regular expressions to do this searching were I to re-write the script. If you're only looking for a couple of items, this is one way to tackle it, but for anything more I would highly recommend using a regular expression.
If Not objFSO.FileExists("csv.txt") Then objFSO.CreateTextFile("csv.txt") Set objFile = objFSO.GetFile("csv.txt") Set objWrite = objFile.OpenAsTextStream( 2, -2 ) Set objFile2 = objFSO.GetFile("skaters.txt") Set objRead = objFile2.OpenAsTextStream( 1, -2 )
So we've opened the file for reading, and created a file to retain the extracted data in comma separated values. Now we just need to define exactly what we want. I had it easy because the columns of data that I wanted to start with all had a similar class. So I just skipped through all the content in the head and body of the web page that occurred before the data and then started my work. You may have to examine the downloaded file and figure out your own plan of attack.
Do Until Left(thisLine, Len(strSearch)) = strSearch And Not objRead.AtEndOfStream thisLine = objRead.ReadLine Loop
Now I do the extraction of the first name, last name, and points. This could be somewhat problematic if the information you seek is not in such a well defined format. But mine was, so here's how I did it:
While Not objRead.AtEndOfStream If Left(thisLine, Len(strSearch)) = strSearch Then '=== name thisLine = objRead.ReadLine '=== trimming thisLine = Mid(thisLine, InStr(thisLine, "<a")) thisLine = Mid(thisLine, InStr(thisLine, """>") + 2) thisLine = Left(thisLine, InStr(thisLine, "</a>") -1) '=== extract first name firstName = Left(thisLine, InStr(thisLine, " ") -1) '=== extract last name lastName = Right(thisLine, len(thisLine) - Len(firstName) - 1) End If '=== now search for points column If InStr(thisLine, "td class=""ysptblclbg6""") > 0 Then 'points '=== trim thisLine = Mid(thisLine, InStr(thisLine, "<span class=""yspscores"">") + 24) thisLine = Left(thisLine, InStr(thisLine,"<") - 1) '=== extract points pts = thisLine objWrite.WriteLine( firstName & "," & lastName & "," & pts ) End if thisLine = objRead.ReadLine Wend
I admit, this is one of my biggest downfalls. Dirty socks on the floor, empty glasses on the table, and rogue variables left hanging in memory space. Of course, with a script so small, it won't bring your server to its knees, but it's a good habit to get rid of the trash as soon as it becomes such. So try something like this at the end of your script:
Set objShell = Nothing Set objFile = Nothing Set objWrite = Nothing Set objRead = Nothing Set objFSO = Nothing Set objXML = Nothing
Conclusion
This was not an in depth discussion of HTTP tunneling; we just made use of it to suit our needs. But if you are the inquisitive type that really needs to know more of what's going on, some recommended reading is HTTP Tunneling Revealed (http://www.devarticles.com/c/a/ASP/HTTP-Tunneling-Revealed-Part-1/).
In case you want to know, I did get that nice dinner. I enjoyed some amazing scallops as an appetizer, a nice venison entree, and one fantastic bottle of wine (between the two of us naturally). My girlfriend had Arctic Char, but wasn't incredibly satisfied with it. I don't remember what we had for dessert, something chocolaty no doubt.
I hope this provides a solution to many a question I've seen posted here and all over the web. I can promise you'll find it handy; I can't promise the venison, however. Sorry.