Working with Lucene.Net - Using Lucene.Net
(Page 2 of 5 )
Lucene.Net is not a standalone search engine application. It can’t be used as-is out of the box to index and search your data or the Web. Out of the box, Lucene.Net can’t extract or read your binary data (such as Microsoft Office or PDF files), make use of SQL data, or crawl the Web.
You must understand this about Lucene.Net so that you will be able to appreciate and understand its capabilities. All that Lucene.Net has to offer is a set of rich APIs that you must call to first create a Lucene.Net index and later search on that index. The task of extracting raw text data out of your binary data is your job. You have to write the code to read from formats such as Microsoft Office files, extract the raw text out of the files, and pass this raw text data to Lucene.Net, where it can finally be indexed and later searched.
After your raw text data has been indexed, you can use Lucene.Net’s API to search this data. Indexing and searching via Lucene.Net’s APIs is easy and yet very powerful.
A Brief History of Lucene.Net
Lucene.Net’s origins can be traced back to its parent project, Apache Lucene. Apache Lucene is written in Java, is well established as an ASF project, and has solid followers in the open source community. Lucene.Net is a port of Apache Lucene to C# that utilizes the Microsoft .NET Framework, and it preserves the look and feel of Apache Lucene’s API.
If you open any C# file and its corresponding Java file, you’ll see that, with the exception of the naming conventions, the class names and method names are the same—-that is, org.apache.lucene.store.FSDirectory.createOutput() in Java becomes Lucene.Net.Store.FSDirectory.CreateOutput() in C#. It’s not only the classes and methods that are ported to C#, though; the Lucene algorithms are ported too, as well as the Lucene index format.
This consistent port offers a number of advantages. First, it means someone familiar with Lucene’s Java implementation will have an easy time reading Lucene.Net’s C# code.
More importantly, it means applications using Lucene.Net can coexist with applications using the Java version. Indexes can be read, modified, and shared between either version. What’s more, both the Java and C# versions can share Lucene’s lock file, so you Apache Lucene and Lucene.Net can use the same index concurrently.
Finally, in addition to the C# port of Lucene’s core code, the Lucene test code is also ported to C#. All NUnit tests pass as they do with the Java version. This should give you a high level of confidence in the C# port of the code.
Two groups of APIs make up Lucene.Net: the indexing APIs and the search APIs. You will spend most of your time writing code for the search APIs. However, before you can start searching, you must create indexes.
Next: Creating an index >>
More BrainDump Articles
More By O'Reilly Media
|
This article is excerpted from chapter four of the book Windows Developer Power Tools, written by James Avery and Jim Holmes (O'Reilly; ISBN: 0596527543). Check it out today at your favorite bookstore. Buy this book now.
|
|