Working with Lucene.Net - Searching an index
(Page 5 of 5 )
Searching in Lucene.Net is similar to indexing and offers great functionality. It’s expected that you will spend more time in Lucene.Net’s search APIs than in the indexing ones.
There are several ways you can search your index. You can use Lucene.Net to search one index, or you can search multiple indexes usingMultiSearcher. Searching two or more indexes distributes your data across multiple indexes for faster searching, better tuning, and greater control.
For example, you can separate your data into date ranges, perhaps creating an index for each month. This will allow you to narrow your search to a particular month’s index or combine multiple months’ indexes. (Obviously, this kind of index creation doesn’t have to be date-related; it can be based on any useful criteria.)
In addition to theMultiSearcher, Lucene.Net also offers theRemoteSearchablecapability. WithRemoteSearchable, you can rely on Lucene.Net’s web server API to search one or more indexes residing on different servers.
Lucene.Net also gives you the power and flexibility of searching on one or more fields, individually weighting any of your fields, and applying Boolean query criteria such asAND,OR,NOT,NEAR, andDATE_RANGE. What’s more, you can update an index and search it at the same time. Once the index update is done, just close your searcher and reopen it, and your updated data will be available.
Our Lucene.Net example application will show you how to search the index that we created in Example 4-1, where we indexed the filesystem. Example 4-2 shows a slightly modified version of the demo code found in Lucene.Net’s source-code distribution.
Example 4-2. A Lucene.Net command-line sample application to search an index
using System;
using Analyzer = Lucene.Net.Analysis.Analyzer;
using StandardAnalyzer = Lucene.Net.Analysis.Standard.StandardAnalyzer; using Document = Lucene.Net.Documents.Document;
using QueryParser = Lucene.Net.QueryParsers.QueryParser;
using Hits = Lucene.Net.Search.Hits;
using IndexSearcher = Lucene.Net.Search.IndexSearcher;
using Query = Lucene.Net.Search.Query;
using Searcher = Lucene.Net.Search.Searcher;
namespace Lucene.Net.Demo
{
class SearchFiles
{
[STAThread]
public static void Main(System.String[] args)
{
try
{
Searcher searcher = new IndexSearcher(@"index");
Analyzer analyzer = new StandardAnalyzer();
// Create a new StreamReader using standard input as the stream
System.IO.StreamReader streamReader =
new System.IO.StreamReader(
// Sets reader's input stream to the standard input stream
new System.IO.StreamReader(
System.Console.OpenStandardInput(),
System.Text.Encoding.Default)
.BaseStream,
// Sets reader's encoding to whatever standard input is using
new System.IO.StreamReader(
System.Console.OpenStandardInput(),
System.Text.Encoding.Default)
.CurrentEncoding);
while (true)
{
System.Console.Out.Write("Query: ");
System.String line = streamReader.ReadLine();
if (line.Length <= 0)
break;
Query query = QueryParser.Parse(line, "contents", analyzer);
System.Console.Out.WriteLine("Searching for: " +
query.ToString("contents"));
Hits hits = searcher.Search(query);
System.Console.Out.WriteLine(hits.Length() +
" total matching documents");
int HITS_PER_PAGE = 10;
for (int start = 0; start < hits.Length(); start += HITS_PER_PAGE)
{
int end = System.Math.Min(hits.Length(), start + HITS_PER_PAGE);
for (int i = start; i < end; i++)
{
Document doc = hits.Doc(i);
System.String path = doc.Get("path");
if (path != null)
{
System.Console.Out.WriteLine(i + ". " + path);
}
else
{
System.String url = doc.Get("url");
if (url != null)
{
System.Console.Out. WriteLine(i + ". " + url);
System.Console.Out. WriteLine(" - " + doc.Get("title"));
}
else
{
System.Console.Out. WriteLine(i + ". " +
"No path nor URL for this document");
}
}
}
if (hits.Length() > end)
{
System.Console.Out.Write("more (y/n) ? ");
line = streamReader.ReadLine();
if (line.Length <= 0 || line[0] == 'n')
break;
}
}
}
searcher.Close();
}
catch (System.Exception e)
{
System.Console.Out.WriteLine(" caught a " + e.GetType() +
"\n with message: " + e.Message);
}
}
}
}
In this example application, the key Lucene.Net references being used areStandardAnalyzer,Document,QueryParser,Hits,IndexSearcher,Query, andSearcher.
Understanding searchers. ASearcheris the front door to your index. Through it, search single or multiple indexes located locally on your hard drive or remotely on different machines. The following line:
Searcher searcher = new IndexSearcher(@"index");
creates aSearcherobject by instantiating anIndexSearcher. The parameter passed toIndexSearcheris the name of a folder containing an index, expressed as either a full path or a relative path.
Using analyzers in searching. We used analyzers when we created the index. Why do we need them again during searching? During indexing, we used an analyzer to clean up our raw text. The same rules must be applied on the text a user types at the search prompt. Furthermore, the same type of analyzer must be used for searching as for indexing, or the search results will not be correct—or, even worse, no hits may be returned at all.
This line creates the matching analyzer:
Analyzer analyzer = new StandardAnalyzer();
Revisiting documents. We also covered theDocumentclass during indexing. At search time, we use aDocumentobject to hold information about a hit resulting from a search query. TheDocumentobject contains the fields and the data in those fields.
In our example application, a reference to aDocumentobject is retrieved like so:
Document doc = hits.Doc(i);
Parsing user input with QueryParser. AQueryParser works hand-in-hand with an analyzer. The job of theQueryParseris to take a user’s query, apply the same rules as the analyzer, and figure out what the user is searching for.
For example, if your search query is+cat +dog, theQueryParserwill know that you are searching for both the words cat and dog and that they must be in the same field.
The+option marks a term as a required part of the query.
Lucene.Net supports several such power-search features. You can do a Boolean search usingOR,AND, andNOTterms, and you can limit your search to a particular field.
In our example application, aQueryParseris created like so:
Query query = QueryParser.Parse(line, "contents", analyzer);
Here, we pass three parameters to the parser. The first is the string that the user typed (the search query). The second parameter is the name of the default field that we will search. You can specify multiple fields, or no field at all, leaving it up to the user to identify the field to search in. The final parameter is the analyzer.
Working with search hits. AHitscollection is what you get back as a result of running a search query. If your search query returns hits, you use theHitsobject to iterate over a list ofDocumentobjects.
In our example application, a reference to aHitsobject is returned like so:
Hits hits = searcher.Search(query);
Remember that we instantiated aSearcherobject and pointed it at our index folder. Now we’re passing it a reference to theQueryobject discussed previously. This kind of abstraction is what makes Lucene.Net so flexible and powerful; working with an index is consistent, regardless of whether you’re using one or more indexes and whether they’re local or remote. Additionally, the search behavior is consistent, whether you have one query or a combination of queries.
Running the SearchFiles application
When you’re ready to run the application, move to the folder where the index was created during indexing. Once you are in that folder, run the SearchFiles application by just typing its name (using the fully qualified pathname if you haven’t copied it to the same directory as the indexes).
Getting Support
Since Lucene.Net is an open source project and is incubated into ASF, support for it is through its mailing list, noted at the project’s home page. Subscribe to the mailing list and post your questions there. Questions are answered in a timely fashion, and the community is looking to grow.
Lucene.Net in a Nutshell
Lucene.Net is a powerful, fast, and feature-rich search engine. In addition, it is open source, is incubated at ASF, and has a support community.
Today, Lucene.Net is being used to index and search filesystems, email data, web pages, and even source code. What’s more, Lucene.Net is being used in commercial applications as a web service search engine, as an embedded search engine for Outlook, and as a desktop search engine for Novel Linux via the Mono compiler.
As applications become more and more complex and generate more and more data, the addition of a search feature is becoming a logical solution. Lucene.Net’s APIs make it possible to integrate powerful search capabilities into your applications. What’s more,
Lucene.Net provides the means to scale; supports different languages; and is cross compatible
with Apache Lucene at the API, algorithmic, and index levels.
—George Aroush, committer for Lucene.Net
Please check back next week for the continuation of this article.
| DISCLAIMER: The content provided in this article is not warranted or guaranteed by Developer Shed, Inc. The content provided is intended for entertainment and/or educational purposes in order to introduce to the reader key ideas, concepts, and/or product reviews. As such it is incumbent upon the reader to employ real-world tactics for security and implementation of best practices. We are not liable for any negative consequences that may result from implementing any information covered in our articles or tutorials. If this is a hardware review, it is not recommended to open and/or modify your hardware. |
|
This article is excerpted from chapter four of the book Windows Developer Power Tools, written by James Avery and Jim Holmes (O'Reilly; ISBN: 0596527543). Check it out today at your favorite bookstore. Buy this book now.
|
|