Searching MCMS with SharePoint - Microsoft SharePoint Portal Server Search
(Page 2 of 4 )
To fully leverage SharePoint Portal Server Search to your advantage, you need to understand how it works and how to configure it. Before we explain how it works, there are a few key components that need to be understood:
- A content source contains the information that will be indexed. Content sources can be external websites, file shares, Windows SharePoint Services sites, Microsoft Exchange public folders, or other systems that provide a protocol handler for SharePoint Search such as Lotus Notes.
- Index files contain crawled content from one or more content sources. Aggregating and cataloging content from disparate content sources enables future search queries to be much more efficient. Index files can also be copied or propagated to SharePoint Web servers for more efficient searching. Two indexes are created by default when you create a new portal: Portal_Content and Non_Portal_Content. As expected, the former contains all content stored in the portal while the latter contains content outside of the portal.
- Search scopes are used to provide a logical grouping of content sources for end users to search. For example, a company may have multiple internal file shares and websites. An employee looking for a specific document doesn't care if it's in site A or file share B, they just know it's out there. An administrator can create multiple content sources and group them together in a single search scope that the user can search against. In addition, search scopes can be configured to only include specific portions of a website, providing even more granular control over what content is indexed and searchable by your users.
- The SharePoint gatherer is responsible for crawling all content sources, extracting content, removing noise words (such as 'and', 'a', 'the', 'or' to name only a few noise word files are customizable so you can add your own noise words), and creating index files that will be used when search queries are executed.
The gatherer is part of the MSSearch service that performs the content crawling and creates the index files. This service runs on schedules that you can configure through the SharePoint Central Administration tool. The MSSearch service activates the gatherer, based on the specified scheduled timetable, which generates a master index for search queries.
An end user uses a search scope to select a collection of content sources to query. SharePoint looks at the catalog containing the content sources and determines the best candidates that match the search query.
Preparing the MCMS Site for Indexing Before we can configure SharePoint to index our MCMS site, there are a few steps we need to take to make the indexing more efficient and useful. First and foremost, check if your site has the MCMS option Map Channel Names to Host Header Names set. If so, you'll need to disable it because one of the two options we have, utilizing the MCMS Connector, does not support host header names. For the rest of this chapter, we will assume our site exists in the top-level channel TropicalGreen.
If your site uses the Map Channel Names to Host Header Names option, you may need to rename the top level channel to reflect the channel we'll use in this example (namely TropicalGreen).
In addition, our example assumes you've set up MCMS and SharePoint according to Appendix A, Setting up MCMS & SPS on the Same Virtual Server. If your MCMS Web Entry Point and SharePoint portal are not in the same virtual server, this requirement may not affect you.
Second, we'll configure our site for guest access. The majority of our Tropical Green site is intended to be available to any anonymous visitor. While we do have one restricted section of our site, we will set up a new account that will have read access to our entire site for use by SharePoint as it crawls our site. Then we'll filter the results to ensure that the user running the search will only see items in the search results he or she has access to.
Next, we need to address how MCMS and output caching behave on requests for postings. The default page rendering behavior of MCMS is not performance-friendly to SPS searching. Because all MCMS requests return an HTTP status code of 200, SharePoint will always perform full crawls of our site and not an incremental crawl. We have already explained the details of what happens with each index crawl request and implemented a solution in Chapter 4, Preparing Postings for Search Indexing.
Finally, we'll add a control, supplied with the MCMS Connector for SharePoint Technologies, to our templates that makes additional metadata properties available to the index crawler, giving additional information for users searching our site.
Next: Disabling Channel Names to Host Header Names Mapping >>
More Windows Scripting Articles
More By PACKT Publishing
|
This article is excerpted from chapter five of the book Advanced Microsoft Content Management Server Development, written by Lim Mei Ying et al. (PACKT, 2005; ISBN: 1904811531). Check it out today at your favorite bookstore. Buy this book now.
|
|