Getting Remote Pages with ASP - Let's Code
(Page 2 of 4 )
As always, we'll begin at the beginning. Let us first declare all variables for use within the script, and instantiate some of the objects that we'll be using:
Dim objRead, objWrite, objShell, objXML, objFile, objFSO
Dim location, thisFolder, skaters, strURL,
Set objFSO = Server.CreateObject("Scripting.FileSystemObject")
Set objShell = Server.CreateObject("Wscript.Shell")
Set objXML = Server.CreateObject("MSXML2.ServerXMLHTTP")
If you receive an error at this point to the effect of "CreateObject Failed", there's a very good chance that the XML Parser is not yet properly installed. This only leads me to believe that you skipped over the 'Before We Proceed' section. Now would be a great time to go back there, and install the necessary components before you proceed!
I just want to quickly relate how a near disaster was averted by the next bit of code. See, I initially designed the script at home, and it worked wonderfully. I emailed the script to my coworker without testing it in my work environment, making the same assumption that I'm sure many do, that if it works here it will work anywhere. But as Murphy's Law would dictate, it did not work for him. Not at all.
It was quickly deduced that our proxy server was preventing my wonderful invention from working. Beads of sweat were forming on my brow. The probability of enjoying that fine dinner suddenly seemed very... improbable.
After some digging, I found that Microsoft offers a little tool to deal with exactly this problem, called ProxyCfg.exe, or the Proxy Configuration Tool. As you've already installed it per my instructions previously, let's get to work using it!
'=== ASP
dim strPath, objScript
strPath = Server.MapPath( Request.ServerVariables("SCRIPT_NAME") )
Set objScript = objFSO.GetFile( strPath )
Set thisFolder = objScript.ParentFolder
thisFolder = objFSO.GetFolder
'=== VBS
'thisFolder = objFSO.GetFolder( objFSO.GetParentFolderName( WScript.ScriptFullName ) )
objShell.Run( thisFolder & "proxycfg.exe -u" )
It's that simple! However, there is much opportunity for error here. The reason is that I've just left the ProxyCfg.exe in the same folder as the script. If that folder is in any way protected or inaccessible to the script, the file will not be run, and an error generated. Make sure the ProxyCfg.exe is in an accessible folder!
Also, to save the hassle of manually typing in the proxy settings with the proper switches to the tool, we just give it a " -u". This tells it to find Internet Explorer's settings, and mimic them. Easy enough! Oh, and if you're doing this from a VBS file instead of ASP, you may want to include the next line to allow time to breathe whilst the ProxyCfg.exe runs.
WScript.Sleep( 4000 )
Now that we've ensured passage through the Proxy server, we can define the URL of the page we want to get, and go about getting it.
strURL = "http://sports.yahoo.com/nhl/stats/byposition?pos=C,RW,LW,D&conference=NHL"
objXML.Open "GET", strURL, False
objXML.Send()
skaters = objXML.responseText
You should experience no errors here, because you've already installed the XML Parser, right? Ok, now that we have the page stored in a variable called skaters, let's save it locally.
If Not objFSO.FileExists("skaters.txt") Then objFSO.CreateTextFile("skaters.txt")
Set objFile = objFSO.GetFile("skaters.txt")
Set objWrite = objFile.OpenAsTextStream( 2,-2 )
'Response.Write( skaters )
'Response.End()
objWrite.Write( skaters )
objWrite.Close()
Set objFile = Nothing
Set objWrite = Nothing
Here we've simply saved the info to a text file in the same directory. This is only for convenience. You can of course save it as a .htm file, if for example you want to see the saved page in a browser. You could also build a fancy directory structure to reflect the hierarchy of the published pages of a CMS. Or really, you can do whatever you wish to do with the captured file, I'm not going to stop you!
You may receive some crazy, buggy error here. It will almost invariably be with the objWrite.Write() line. Try un-commenting the two lines above that line, you should see the received page just fine. But look closely, do you see question marks in odd places? That has to do with the encoding of the page; it may have some non-English Unicode characters.
If this happens, the only hope is if you have some control over the content on the page. It must be scraped for these 'illegal' characters, and replaced with their ISO-5889 equivalents. Then and only then will you be able to work with it. I have created a function to do this, feel free to write me for it.
Next: A Little Parse-ly on the Side >>
More ASP Articles
More By Justin Cook