ASP Code
  Home arrow ASP Code arrow Creating a Personal Search Engine by Sixto...
ASP Free Forums 
.NET  
ASP  
ASP Code  
ASP.NET  
ASP.NET Code  
BrainDump  
C#  
Code Examples  
Database  
Database Code  
IIS  
Microsoft Access  
MS SQL Server  
Silverlight  
Visual Basic.NET  
Windows Scripting  
Windows Security  
XML  
Mobile Linux 
App Generation ROI 
IBM® developerWorks 
ASP Web Hosting  
ASP.NET Web Hosting 
Windows Web Hosting
 
Weekly Newsletter
 
Developer Updates  
Free Website Content 
 RSS  Articles
 RSS  Forums
 RSS  All Feeds
Write For Us Get Paid 
Request Media Kit
Contact Us 
Site Map 
Privacy Policy 
Support 
 USERNAME
 
 PASSWORD
 
 
  >>> SIGN UP!  
  Lost Password? 
ASP CODE

Creating a Personal Search Engine by Sixto Luis Santos
By: aspfree
  • Search For More Articles!
  • Disclaimer
  • Author Terms
  • Rating: 4 stars4 stars4 stars4 stars4 stars / 7
    2000-10-01

    Table of Contents:

    Rate this Article: Poor Best 
      ADD THIS ARTICLE TO:
      Del.ici.ous Digg
      Blink Simpy
      Google Spurl
      Y! MyWeb Furl
    Email Me Similar Content When Posted
    Add Developer Shed Article Feed To Your Site
    Email Article To Friend
    Print Version Of Article
    PDF Version Of Article
     
     
    ADVERTISEMENT


    Creating a Personal Search Engine.
    by Sixto Luis Santos

    Search facilities have become an expected part of every web site. But this is not always possible. For example, if yours is a personal web site that is not always connected to the Internet, or you are in charge of an Intranet with confidential information, you may not or cannot make use of the site indexing capabilities of commercial search engines like Beseen or Altavista. That is exactly why we tried to implement a simple text search facility with the tools that we already have, an ASP capable web server and the VBScript objects.

    Our solution is based principally on two of VBScript's objects: The FileSystemObject, in charge of retrieving the target pages' text, and the RegExp object, to do the actual search and to extract the document's title. We encapsulated the search functionality within two self-contained procedures to allow us flexibility in the search page design. This means that you can change the search page to match the look and feel of your site without requiring major changes in the code.

    Figure 1 - Our search engine in action...
    ~~*~~

    This means that you can change the search page to match the look and feel of your site without requiring major changes in the code.

    ~~*~~

    Our program relies heavily on the RegExp Object. This object allow us to do search or search and replace operations using 'Regular Expressions'. A regular expression is a pattern of text that consists of ordinary characters and special characters, known as metacharacters. The pattern describes one or more strings to match when searching a body of text. The regular expression serves as a template for matching a character pattern to the string being searched. For more information on Regular Expressions and Scripting Technologies in general please refer to the Microsoft Scripting Technologies web site at http://msdn.microsoft.com/scripting/default.htm.

    We begin by creating our starting procedure. This is the procedure we call to start the search process. It takes a single parameter, SearchString, that will hold our search criteria. First, we do a standard instantiation of the objects. Second, we set up the RegExp objects, and here's where the magic begins. By setting the RegExp's Global property we instruct the object to find every match of our search pattern. If we set this to False, as is the case of the GetTitle object, the search stops at the first match found. The IgnoreCase property should be self-explanatory, this simply instructs the object to do case insensitive searches. The Pattern property is where we state the search expression. Note the difference between Regex.Pattern and GetTitle.Pattern below. In the former we just feed the content of the SearchString parameter as it came from the user. In the later we construct a special pattern to match text enclosed in <title> tags. Observe in the code window below the special metacharacters right between <title> and </title>. We use parenthesis to change the order of precedence, the . match any single character except the new line character (In VBScript this would be vbCrLf). The \n match the new line character. The pipe character | in between indicates an or, and the asterisk * indicates to match zero or more of the preceding characters. In summary, this pattern will match anything (any amount of characters or new lines) between <title> and </title>. Third, we make sure that our paths variables contain their trailing slashes as we will be using these as the base path for our matched documents. Fourth, we start the actual search process by calling the SearchFiles procedure. And fifth and last, we display a message if no matches were found and we do some object cleaning. Find below the code for our starting procedure.

    Listing 1 - Starting Procedure

    <%

    Sub Search(SearchString)

    Set fs = CreateObject("Scripting.FileSystemObject")
    Set GetTitle = New RegExp
    Set Regex = New RegExp

    With Regex
    '
    .Global = True
    .IgnoreCase = True
    .Pattern = Trim(SearchString)
    End With
    With GetTitle
    .Global = False
    .IgnoreCase = True
    .Pattern = "<title>(.|\n)*</title>"
    End With

    RootFolder = Server.MapPath(RootFld)

    If Right(RootFld,1) <> "/" Then
    RootFld = RootFld & "/"
    End If

    If Right(RootFolder, 1) <> "\" Then
    RootFolder = RootFolder & "\"
    End If
    rfLen = Len(RootFolder) + 1

    SearchFiles RootFolder

    If MatchedCount = 0 Then
    Response.Write "&nbsp;&nbsp;<B>No Matches Found.</b><BR>"
    End If

    Set Regex = Nothing
    Set GetTitle = Nothing
    Set fs = Nothing

    End Sub

    %>

    ~~*~~

    The next part of our project is the search engine itself. This engine is in the form of a self calling procedure, otherwise known as recursive. We decided to implement the engine as a recursive procedure to simplify the process of traversing a directory tree. Note that in a recursive procedure, a new and independent set of variables and objects are created each time it is called. First, we get the current 'root' folder where files and other folders may exist. Then we iterate thru each file in the folder. We then compare each file's extension to a global variable (not shown) holding a list of extensions for valid files (e.g. html, asp, txt, etc.). If a match is found, the file is opened to get the text contained inside, and the RegExp search is applied. If the search returned one or more matches we then proceed to try and get hold of the document's title by executing the GetTitle RegExp search. This, of course, will only return something for HTML and some ASP files. If we find a title, we use this as our results entry text, otherwise we use the file name. Note that we need to strip out the <title> tags. In version 5.5 of the scripting engine (as found in Windows 2000) a SubMatches object is available, returning what's inside the entities called captured matches, a pattern enclosed in parenthesis, avoiding the need to prepare the match manually. Unfortunately, there's no SubMatches object in the more popular versions 4 or 5 of the scripting engine. Anyway, once we got our entry's name, we proceed to construct the line that will be displayed on our results page. We add some miscellaneous (also known as fancy or mostly useless) information to the entry, and do some html-formatting as we go. Check out the somewhat commented code to the recursive procedure below.

    Listing 2 - Recursive Search Procedure

    <%

    Sub SearchFiles(FolderPath)
    Dim fsFolder
    Dim fsFolder2
    Dim fsFile
    Dim fsText
    Dim FileText
    Dim FileTitle
    Dim FileTitleMatch
    Dim MatchCount
    Dim OutputLine

    ' Get the starting folder
    Set fsFolder = fs.GetFolder(FolderPath)
    ' Iterate thru every file in the folder
    For Each fsFile In fsFolder.Files
    ' Compare the current file extension with the list of valid target files
    If InStr(1, ValidFiles, Right(fsFile.Name, 3), vbTextCompare) > 0 Then
    DocCount = DocCount + 1
    ' Open the file to read its content
    Set fsText = fsFile.OpenAsTextStream
    FileText = fsText.ReadAll
    ' Apply the regex search and get the count of matches found
    MatchCount = Regex.Execute(FileText).Count
    MatchedCount = MatchedCount + MatchCount
    If MatchCount > 0 Then
    DocMatchCount = DocMatchCount + 1
    ' Apply another regex to get the html document's title
    Set FileTitleMatch = GetTitle.Execute(FileText)
    If FileTitleMatch.Count > 0 Then
    ' Strip the title tags
    FileTitle = Trim(replace(Mid(FileTitleMatch.Item(0),8),"</title>","",1,1,1))
    ' In case the title is empty
    If FileTitle = "" Then
    FileTitle = "No Title (" & fsFile.Name & ")"
    End If
    Else
    ' Create an alternate entry name (if no title found)
    FileTitle = "No Title (" & fsFile.Name & ")"
    End If
    ' Create the entry line with proper formatting
    ' Add the entry number
    OutputLine = "&nbsp;&nbsp;<b>" & DocMatchCount & ".</B>&nbsp;"
    ' Add the document name and link
    OutputLine = OutputLine & "<A href=" & chr(34) & RootFld & replace(Mid(fsFile.Path, rfLen),"\","/") & chr(34) & "><B>"
    OutputLine = OutputLine & FileTitle & "</B></a>"
    ' Add the document information
    OutputLine = OutputLine & "<font size=1><br>&nbsp;&nbsp;Criteria matched " & MatchCount & " times - Size: "
    OutputLine = OutputLine & FormatNumber(fsFile.Size / 1024,2 ,-1,0,-1) & "K bytes"
    OutputLine = OutputLine & " - Last Modified: " & formatdatetime(fsFile.DateLastModified,vbShortDate) & "</Font><br>"
    ' Display entry
    Response.Write OutputLine
    Response.Flush
    End If
    fsText.Close
    End If
    Next

    ' Iterate thru each subfolder and recursively call this procedure
    For Each fsFolder2 In fsFolder.SubFolders
    SearchFiles fsFolder2.Path
    Next

    ' Do some objects clean-up

    Set FileTitleMatch = Nothing
    Set fsText = Nothing
    Set fsFile = Nothing
    Set fsFolder2 = Nothing
    Set fsFolder = Nothing
    End Sub

    %>

    As you can see, it is very easy to create a simple search engine without expending big bucks on third-party solutions. Bear in mind that this is a very simplistic approach to the search engine problem. Aside from the fact of the absent-minded nature of this engine (it will match text inside code procedures or text inside html tags, something not always desirable), a robust solution would index each file in a separate process and store the information in a database for fast retrieval. Even thought, the solution presented here is sure to satisfy many web developers in need of a simple search facility, and it sure demonstrate what can be done with the sometimes neglected tools available in every ASP developer's toolbox.

    Feel free to send your comments and suggestions to sixtos@prtc.net (threat mail is strongly discouraged).


    DISCLAIMER: The content provided in this article is not warranted or guaranteed by Developer Shed, Inc. The content provided is intended for entertainment and/or educational purposes in order to introduce to the reader key ideas, concepts, and/or product reviews. As such it is incumbent upon the reader to employ real-world tactics for security and implementation of best practices. We are not liable for any negative consequences that may result from implementing any information covered in our articles or tutorials. If this is a hardware review, it is not recommended to open and/or modify your hardware.

    More ASP Code Articles
    More By aspfree

     

    IBM® developerWorks developerWorks - FREE Tools!


    NEW! Application Development Tools for the Mainframe Developer

    You probably have thousands of lines of COBOL code loaded with business intelligence and being used to run your business, along with an army of developers maintaining these applications. Learn how to prepare your applications and developers so you can keep that competitive edge and move to a service-oriented architecture with the IBM Rational Enterprise Modernization solutions. Replay is available for 9 months.
    FREE! Go There Now!


    NEW! Did you say mainframe? e-kit

    Learn how you can extend modern application lifecycle management to IBM System z through the IBM Rational Software Delivery Platform (SDP). The Did you say mainframe? e-kit includes podcasts, webcasts, tutorials, white and red papers, demos, and articles designed to help ease the challenges of modernizing your enterprise. This complimentary kit for mainframe developers is a practical, how-to guide for making the most of an existing development environment, including the skills and infrastructure already in place at an established enterprise.
    FREE! Go There Now!


    NEW! Download IBM Data Studio V1.1

    Visit IBM developerWorks to download the latest trial version of IBM Data Studio V1.1 at no cost. IBM Data Studio is a comprehensive data management solution that helps you effectively design, develop, deploy and manage your data, databases, and database applications throughout the data management life cycle utilizing a consistent and integrated user interface. Unlike other client-side data management solutions that focus on only one aspect of the application lifecycle or database administration, Data Studio complements the Rational Software Delivery platform, providing unparalleled flexibility for a heterogeneous data server environment across platforms.
    FREE! Go There Now!


    NEW! IBM Rational ClearCase Innovator's Series

    Learn from the best! Find out how developers use Rational ClearCase to be more flexible, innovative and deliver higher quality code in the Rational ClearCase Power Users eKit. This complimentary eKit provides a collection of materials, like articles, whitepapers, and demos that can help you become a power user of Rational ClearCase.
    FREE! Go There Now!


    NEW! Rational Talks to You:Per Kroll on Rational Method Composer Plug-in customization

    Join this Rational Talks to You teleconference on December 11 at 1:00 pm ET to get tips on building your own plugins with Rational Method Composer. Get your questions answered!
    FREE! Go There Now!


    NEW! The dirty dozen: preventing common application-level hack attacks

    As organizations have grown increasingly dependent on online software, the risk of malicious attacks has also become far more serious. Fortunately, well-governed organizations can protect their Web applications by injecting vulnerability assessments and ethical hacks into their software development and delivery processes. This paper describes 12 of the most common hacker attacks and provides basic rules that you can follow to help create more hack-resistant Web applications.
    FREE! Go There Now!


    NEW! Trial download: IBM Informix Dynamic Server Express Edition V11.0

    Informix Dynamic Server (IDS) Express Edition offers outstanding online transaction processing (OLTP) database performance, while helping to simplify and automate many of the tasks associated with deploying databases for small business applications. IDS 11 further extends the ease of management and applications integration with the Admin API and Scheduler, high availability with Continuous Log Restore for backup server recovery in case of a primary server failure, and column level encryption to protect personal and company private data.
    FREE! Go There Now!


    NEW! Try the IBM SOA Sandbox for Connectivity

    Visit IBM developerWorks to try the IBM SOA Sandbox for connectivity. The SOA Sandbox for connectivity provides a trial environment with the tooling and components to help you explore how to effectively connect your infrastructure and integrate all of the people, processes and information in your company. Use the hosted sandbox to explore SOA techniques that streamline connecting existing IT assets together, as well as learn how to connect them to new business logic.
    FREE! Go There Now!


    NEW! Try the IBM SOA Sandbox for Process

    Visit IBM developerWorks to try the IBM SOA Sandbox for process. The SOA Sandbox for process focuses on providing a trial environment with the necessary tooling and components required to gain a better understanding of business processes and how to best improve existing business processes to derive value quickly.
    FREE! Go There Now!


    NEW! Using IBM Rational Developer for System z and IBM Rational ClearCase together to manage application development

    Whether you are creating new applications or modifying existing ones, managing integration of new components with traditional z/OS elements is a critical part of building and deploying modern applications. Listen to this webcast to see how IBM can help you optimize your development process using an IDE like Rational Developer for System z that integrates with management tools, such as ClearCase to manage your application development on mainframes.
    FREE! Go There Now!



    All FREE IBM® developerWorks Tools!

    ASP CODE ARTICLES

    - ASP Forms
    - ASP: The Beginning
    - Getting Remote Files With ASP Continued
    - Inbox and Outbox Manipulation in ASP
    - Relational DropDownList Using VB.NET
    - Ad Tracking URL Hits
    - Use ViewState to display one record per page...
    - Send Email using ASP.NET formatted in HTML
    - ASP File Explorer
    - ASP/XML Interview questions by Srivatsan Sri...
    - Conditional DataGrid Item and using checkbox...
    - Fill .NET Listbox with SQL DataReader
    - Filling Dropdown box using Code-Behinds in C#
    - FLAMES code sample written in .NET What is F...
    - Format Date/Time in a console app class





    © 2003-2010 by Developer Shed. All rights reserved. DS Cluster 1 Hosted by Hostway
    For more Enterprise Application Development news, visit eWeek