ASP Code
  Home arrow ASP Code arrow Creating a Personal Search Engine by Sixto...
ASP Free Forums 
.NET  
ASP  
ASP Code  
ASP.NET  
ASP.NET Code  
BrainDump  
C#  
Code Examples  
Database  
Database Code  
IIS  
Microsoft Access  
MS SQL Server  
Visual Basic.NET  
Windows Scripting  
Windows Security  
XML  
ASP Web Hosting  
ASP.NET Web Hosting 
Mobile Linux 
App Generation ROI 
Windows Web Hosting
 
IBM® developerWorks 
Sun Developer Network 
Weekly Newsletter
 
Developer Updates  
Free Website Content 
 RSS  Articles
 RSS  Forums
 RSS  All Feeds
Write For Us Get Paid 
Request Media Kit
Contact Us 
Site Map 
Privacy Policy 
Support 
 USERNAME
 
 PASSWORD
 
 
  >>> SIGN UP!  
  Lost Password? 
ASP CODE

Creating a Personal Search Engine by Sixto Luis Santos
By: aspfree
  • Search For More Articles!
  • Disclaimer
  • Author Terms
  • Rating: 4 stars4 stars4 stars4 stars4 stars / 7
    2000-10-01

    Table of Contents:

    Rate this Article: Poor Best 
      ADD THIS ARTICLE TO:
      Del.ici.ous Digg
      Blink Simpy
      Google Spurl
      Y! MyWeb Furl
    Email Me Similar Content When Posted
    Add Developer Shed Article Feed To Your Site
    Email Article To Friend
    Print Version Of Article
    PDF Version Of Article
     
     
    ADVERTISEMENT


    Creating a Personal Search Engine.
    by Sixto Luis Santos

    Search facilities have become an expected part of every web site. But this is not always possible. For example, if yours is a personal web site that is not always connected to the Internet, or you are in charge of an Intranet with confidential information, you may not or cannot make use of the site indexing capabilities of commercial search engines like Beseen or Altavista. That is exactly why we tried to implement a simple text search facility with the tools that we already have, an ASP capable web server and the VBScript objects.

    Our solution is based principally on two of VBScript's objects: The FileSystemObject, in charge of retrieving the target pages' text, and the RegExp object, to do the actual search and to extract the document's title. We encapsulated the search functionality within two self-contained procedures to allow us flexibility in the search page design. This means that you can change the search page to match the look and feel of your site without requiring major changes in the code.

    Figure 1 - Our search engine in action...
    ~~*~~

    This means that you can change the search page to match the look and feel of your site without requiring major changes in the code.

    ~~*~~

    Our program relies heavily on the RegExp Object. This object allow us to do search or search and replace operations using 'Regular Expressions'. A regular expression is a pattern of text that consists of ordinary characters and special characters, known as metacharacters. The pattern describes one or more strings to match when searching a body of text. The regular expression serves as a template for matching a character pattern to the string being searched. For more information on Regular Expressions and Scripting Technologies in general please refer to the Microsoft Scripting Technologies web site at http://msdn.microsoft.com/scripting/default.htm.

    We begin by creating our starting procedure. This is the procedure we call to start the search process. It takes a single parameter, SearchString, that will hold our search criteria. First, we do a standard instantiation of the objects. Second, we set up the RegExp objects, and here's where the magic begins. By setting the RegExp's Global property we instruct the object to find every match of our search pattern. If we set this to False, as is the case of the GetTitle object, the search stops at the first match found. The IgnoreCase property should be self-explanatory, this simply instructs the object to do case insensitive searches. The Pattern property is where we state the search expression. Note the difference between Regex.Pattern and GetTitle.Pattern below. In the former we just feed the content of the SearchString parameter as it came from the user. In the later we construct a special pattern to match text enclosed in <title> tags. Observe in the code window below the special metacharacters right between <title> and </title>. We use parenthesis to change the order of precedence, the . match any single character except the new line character (In VBScript this would be vbCrLf). The \n match the new line character. The pipe character | in between indicates an or, and the asterisk * indicates to match zero or more of the preceding characters. In summary, this pattern will match anything (any amount of characters or new lines) between <title> and </title>. Third, we make sure that our paths variables contain their trailing slashes as we will be using these as the base path for our matched documents. Fourth, we start the actual search process by calling the SearchFiles procedure. And fifth and last, we display a message if no matches were found and we do some object cleaning. Find below the code for our starting procedure.

    Listing 1 - Starting Procedure

    <%

    Sub Search(SearchString)

    Set fs = CreateObject("Scripting.FileSystemObject")
    Set GetTitle = New RegExp
    Set Regex = New RegExp

    With Regex
    '
    .Global = True
    .IgnoreCase = True
    .Pattern = Trim(SearchString)
    End With
    With GetTitle
    .Global = False
    .IgnoreCase = True
    .Pattern = "<title>(.|\n)*</title>"
    End With

    RootFolder = Server.MapPath(RootFld)

    If Right(RootFld,1) <> "/" Then
    RootFld = RootFld & "/"
    End If

    If Right(RootFolder, 1) <> "\" Then
    RootFolder = RootFolder & "\"
    End If
    rfLen = Len(RootFolder) + 1

    SearchFiles RootFolder

    If MatchedCount = 0 Then
    Response.Write "&nbsp;&nbsp;<B>No Matches Found.</b><BR>"
    End If

    Set Regex = Nothing
    Set GetTitle = Nothing
    Set fs = Nothing

    End Sub

    %>

    ~~*~~

    The next part of our project is the search engine itself. This engine is in the form of a self calling procedure, otherwise known as recursive. We decided to implement the engine as a recursive procedure to simplify the process of traversing a directory tree. Note that in a recursive procedure, a new and independent set of variables and objects are created each time it is called. First, we get the current 'root' folder where files and other folders may exist. Then we iterate thru each file in the folder. We then compare each file's extension to a global variable (not shown) holding a list of extensions for valid files (e.g. html, asp, txt, etc.). If a match is found, the file is opened to get the text contained inside, and the RegExp search is applied. If the search returned one or more matches we then proceed to try and get hold of the document's title by executing the GetTitle RegExp search. This, of course, will only return something for HTML and some ASP files. If we find a title, we use this as our results entry text, otherwise we use the file name. Note that we need to strip out the <title> tags. In version 5.5 of the scripting engine (as found in Windows 2000) a SubMatches object is available, returning what's inside the entities called captured matches, a pattern enclosed in parenthesis, avoiding the need to prepare the match manually. Unfortunately, there's no SubMatches object in the more popular versions 4 or 5 of the scripting engine. Anyway, once we got our entry's name, we proceed to construct the line that will be displayed on our results page. We add some miscellaneous (also known as fancy or mostly useless) information to the entry, and do some html-formatting as we go. Check out the somewhat commented code to the recursive procedure below.

    Listing 2 - Recursive Search Procedure

    <%

    Sub SearchFiles(FolderPath)
    Dim fsFolder
    Dim fsFolder2
    Dim fsFile
    Dim fsText
    Dim FileText
    Dim FileTitle
    Dim FileTitleMatch
    Dim MatchCount
    Dim OutputLine

    ' Get the starting folder
    Set fsFolder = fs.GetFolder(FolderPath)
    ' Iterate thru every file in the folder
    For Each fsFile In fsFolder.Files
    ' Compare the current file extension with the list of valid target files
    If InStr(1, ValidFiles, Right(fsFile.Name, 3), vbTextCompare) > 0 Then
    DocCount = DocCount + 1
    ' Open the file to read its content
    Set fsText = fsFile.OpenAsTextStream
    FileText = fsText.ReadAll
    ' Apply the regex search and get the count of matches found
    MatchCount = Regex.Execute(FileText).Count
    MatchedCount = MatchedCount + MatchCount
    If MatchCount > 0 Then
    DocMatchCount = DocMatchCount + 1
    ' Apply another regex to get the html document's title
    Set FileTitleMatch = GetTitle.Execute(FileText)
    If FileTitleMatch.Count > 0 Then
    ' Strip the title tags
    FileTitle = Trim(replace(Mid(FileTitleMatch.Item(0),8),"</title>","",1,1,1))
    ' In case the title is empty
    If FileTitle = "" Then
    FileTitle = "No Title (" & fsFile.Name & ")"
    End If
    Else
    ' Create an alternate entry name (if no title found)
    FileTitle = "No Title (" & fsFile.Name & ")"
    End If
    ' Create the entry line with proper formatting
    ' Add the entry number
    OutputLine = "&nbsp;&nbsp;<b>" & DocMatchCount & ".</B>&nbsp;"
    ' Add the document name and link
    OutputLine = OutputLine & "<A href=" & chr(34) & RootFld & replace(Mid(fsFile.Path, rfLen),"\","/") & chr(34) & "><B>"
    OutputLine = OutputLine & FileTitle & "</B></a>"
    ' Add the document information
    OutputLine = OutputLine & "<font size=1><br>&nbsp;&nbsp;Criteria matched " & MatchCount & " times - Size: "
    OutputLine = OutputLine & FormatNumber(fsFile.Size / 1024,2 ,-1,0,-1) & "K bytes"
    OutputLine = OutputLine & " - Last Modified: " & formatdatetime(fsFile.DateLastModified,vbShortDate) & "</Font><br>"
    ' Display entry
    Response.Write OutputLine
    Response.Flush
    End If
    fsText.Close
    End If
    Next

    ' Iterate thru each subfolder and recursively call this procedure
    For Each fsFolder2 In fsFolder.SubFolders
    SearchFiles fsFolder2.Path
    Next

    ' Do some objects clean-up

    Set FileTitleMatch = Nothing
    Set fsText = Nothing
    Set fsFile = Nothing
    Set fsFolder2 = Nothing
    Set fsFolder = Nothing
    End Sub

    %>

    As you can see, it is very easy to create a simple search engine without expending big bucks on third-party solutions. Bear in mind that this is a very simplistic approach to the search engine problem. Aside from the fact of the absent-minded nature of this engine (it will match text inside code procedures or text inside html tags, something not always desirable), a robust solution would index each file in a separate process and store the information in a database for fast retrieval. Even thought, the solution presented here is sure to satisfy many web developers in need of a simple search facility, and it sure demonstrate what can be done with the sometimes neglected tools available in every ASP developer's toolbox.

    Feel free to send your comments and suggestions to sixtos@prtc.net (threat mail is strongly discouraged).


    DISCLAIMER: The content provided in this article is not warranted or guaranteed by Developer Shed, Inc. The content provided is intended for entertainment and/or educational purposes in order to introduce to the reader key ideas, concepts, and/or product reviews. As such it is incumbent upon the reader to employ real-world tactics for security and implementation of best practices. We are not liable for any negative consequences that may result from implementing any information covered in our articles or tutorials. If this is a hardware review, it is not recommended to open and/or modify your hardware.

    More ASP Code Articles
    More By aspfree

     

    IBM® developerWorks developerWorks - FREE Tools!


    NEW! Cook up Web sites fast with CakePHP, Part 4: Use CakePHP&apos;s Session and Request Handler components

    CakePHP is a stable production-ready, rapid-development aid for building Web sites in PHP. This "Cook up Web sites fast with CakePHP" series shows you how to build an online product catalog using CakePHP.
    FREE! Go There Now!


    NEW! Hacking 101

    Join us for this web seminar to learn how you can defend your web applications from attack. Learn about the 3 most common web application attacks, including how they occur and what can be done to prevent them. We’ll also discuss manual versus automated approaches for scanning and identifying web application vulnerabilities and how IBM Rational AppScan, an automated vulnerability scanner, can help you automate more of what you are doing manually today.
    FREE! Go There Now!


    NEW! Project and Portfolio Management Executive Resource Kit

    Portfolio Management is about effectively managing portfolio value by aligning portfolio investments with business goals. This complimentary e-kit provides a collection of materials that can help you understand how IBM Rational enables and automates best practices for improved governance and clear visibility into portfolio and project performance across the entire IT project lifecycle.
    FREE! Go There Now!


    NEW! The role of integrated requirements management in software delivery

    This paper is about the critical role that a discipline called integrated require­ments management can play in helping to ensure that your business goals and IT investments are continuously aligned—whether you are sourcing, integrat­ing, building or maintaining software. It also looks at ways that automated IBM Rational® products can work together to help you use requirements in the very best way.
    FREE! Go There Now!


    NEW! Trial download: IBM Informix Dynamic Server Express Edition V11.0

    Informix Dynamic Server (IDS) Express Edition offers outstanding online transaction processing (OLTP) database performance, while helping to simplify and automate many of the tasks associated with deploying databases for small business applications. IDS 11 further extends the ease of management and applications integration with the Admin API and Scheduler, high availability with Continuous Log Restore for backup server recovery in case of a primary server failure, and column level encryption to protect personal and company private data.
    FREE! Go There Now!


    NEW! Understanding Web application security challenges

    As businesses grow increasingly dependent upon Web applications, these complex entities grow more difficult to secure. Most companies equip their Web sites with firewalls, Secure Sockets Layer (SSL), and network and host security, but the majority of attacks are on applications themselves – and these technologies cannot prevent them. This paper explains what you can do to help protect your organization, and it discusses an approach for improving your organization’s Web application security.
    FREE! Go There Now!


    NEW! Webcast: Eclipse: Empowering the universal platform

    The Eclipse community is constantly working to extend Eclipse's functionality. In this webcast, learn about some of the most important and feature-rich projects under development. From multi-language support to plug-in development, tune in to see what Eclipse is capable of now.
    FREE! Go There Now!


    NEW! Webcast: Extreme transaction processing with WebSphere Extended Deployment

    In this webcast, you'll get an introduction to the eXtreme Transaction Processing (XTP) features of WebSphere Extended Deployment and the common architectural traits required by XTP applications. See how WebSphere Extended Deployment's ObjectGrid feature provides a state-of-the-art infrastructure for hosting XTP applications.
    FREE! Go There Now!


    NEW! Webcast: IBM Rational Build Forge - Beyond the Build

    The discipline of assembling and delivering software is maturing beyond standard developer-centric compile/test software builds. The end-to-end software development lifecycle is emerging as the new focus moves “Beyond the Build.” Join this on demand webcast to learn about methods for streamlining software delivery and key capabilities of the IBM Rational Build Forge framework for automating build and release management in environments of any size.
    FREE! Go There Now!


    NEW! Whitepaper: Achieving consistency between business process models and operational guides

    Explore how Rational and WebSphere software enable enterprise documentation in SOA environments. Specifically, a new integration between IBM WebSphere® Business Modeler and IBM Rational® Method Composer software can help technical writers more easily keep enterprise operations manuals in sync with changes that are made to business processes, resulting in more accurate and timely documentation that benefits the entire enterprise.
    FREE! Go There Now!



    All FREE IBM® developerWorks Tools!

    ASP CODE ARTICLES

    - ASP Forms
    - ASP: The Beginning
    - Getting Remote Files With ASP Continued
    - Inbox and Outbox Manipulation in ASP
    - Relational DropDownList Using VB.NET
    - Ad Tracking URL Hits
    - Use ViewState to display one record per page...
    - Send Email using ASP.NET formatted in HTML
    - ASP File Explorer
    - ASP/XML Interview questions by Srivatsan Sri...
    - Various methods of setting Date values to a ...
    - Conditional DataGrid Item and using checkbox...
    - Fill .NET Listbox with SQL DataReader
    - Filling Dropdown box using Code-Behinds in C#
    - FLAMES code sample written in .NET What is F...

     
    Application Delivery: Everything You Wanted to Know, but Didn`t Know You Needed to Ask
    A comprehensive guide to examining the topics of Wide-area Data Services and app....

     
    Best Practices: Safe and Secure Hardware Asset Recovery
    Companies increasingly must meet EPA and local requirements for the disposal of ....

     
    Managing SSL Security in Multi-Server Environments
    Read this white paper to learn how to simplify management of your organization's....

     
    Open Source Security Myths
    Open Source Software (OSS) is computer software whose source code is available t....

     
    Power and Cooling Capacity Management for Data Centers
    This paper describes the principles for achieving power and cooling capacity man....

     




    © 2003-2008 by Developer Shed. All rights reserved. DS Cluster 1 hosted by Hostway
    Stay green...Green IT