These steps will configure the SharePoint searching engine to index Confluence pages such that performing a search within a MOSS Search site shows both SharePoint and Confluence results.

Prerequisite: Microsoft Office SharePoint Server (MOSS) 2007 Required

Note that this search capability requires Microsoft Office SharePoint Server (MOSS) 2007 (Standard or Enterprise) and will not work with Windows SharePoint Services (WSS v3). Searching with Microsoft's new Search Server 2008 is likely possible, but is not yet tested. See Search Configuration for Search Server 2008 for more details.

Prerequisite: July 2008 Infrastructure Update, Service Pack 1, or the Forms Authentication Hotfix

MOSS 2007 shipped without the ability to crawl web sites such as Confluence that use Forms Based Authentication (FBA). There are several ways to update MOSS to provide this functionality. Please review the SharePoint Search Prerequisite Updates for the details. If you think you may already have SharePoint updated you can skip the prerequisite updates and will be instructed to install the hotfix later if there is a problem (note that installing the hotfix is not the only solution as discussed in the prerequisite updates).

Troubleshooting

If you are still having problems after going through the steps below, check out the Troubleshooting SharePoint Search documentation.

Create a Confluence Search Source

To perform searching of Confluence, several search entities need to be created. This includes a Content Source, Crawl Rule, Scope, and the registration of a Security Trimmer. All of this is done for you through two Confluence search administration screens described below. You can also do this manually, but this requires a bit more effort.

Run SharePoint Central Administration on the SharePoint server (Start->Administrative Tools-> SharePoint 3.0 Central Administration) and click on the shared service provider, likely named "SharedServices1" in the quick launch bar along the left side of the web page.

Screenshot: Selecting your shared service provider

On the shared services home page click on Confluence Search Settings to get to the Manage Confluence Search page.

Screenshot: The shared services home page

On the Manage Confluence Search page click on New Confluence Search Source to get to the Confluence Search Source page.

Screenshot: Selecting your Confluence search source

Fill in the data on the page.

Screenshot: The search source configuration screen

Warning

Forms Authentication should currently be used even if you have Confluence set up using NTLM. The MOSS crawler has problems crawling Confluence directly under NTLM, even though it can crawl other Windows authenticated web sites just fine. This issue can be tracked as JIRA item CSI-59. If you are running Tomcat under IIS, one option is to have the crawler search using the Tomcat port. Then you will want to set up a Server Name Mapping (under MOSS Search Settings) so that search results show the IIS port instead of the Tomcat port.

Note

Note that if you do use Windows Authentication for the crawl rule, make sure you include the domain as part of the search crawl user name (e.g., "DOMAIN\user").

Once you have created your Confluence Search Source you can do any of the following:

Click the link under the Content Source heading to edit the Confluence Search Source.
Click on the link under the Crawl Rule heading to edit the generated crawl rule. Typically there is no need to modify the crawl rule.
Click on the link under the Scope heading to view the generated scope's properties and rules. Typically there is no need to modify the scope, but this screen does show you when the scope will be available for use (it is not available immediately).
Open the menu for the Content Source item to:
- Edit the Confluence Search Source.
- Delete Confluence Search Source. This will delete the generated content source, crawl rule, and scope.
- Edit the generated Content Source. This is recommended since the generated content source does not yet have crawl schedules set up yet.
- View the Crawl Log
  
  Screenshot: Adjusting the Confluence search settings

Crawl Confluence Content Source and View Crawl Log Results

Now that we have a content source and crawl rule, we can crawl the content source

As in step 1 Run SharePoint Central Administration, choose your shared service provider (e.g., "SharedServices1"), and navigate to "Search Settings" under the Search group and then "Content sources and crawl schedules".
Crawl the content source you created.

Screenshot: Crawling content under SharePoint

Hit refresh on the page a bit until the content source status is back at Idle.
Then view the Crawl Log (just above "Start Full Crawl" on the drop down).
- You should see several green results with maybe a few yellow items indicating that pages could not be crawled/found.
If it looks like it is working, now may be a good time to edit your content source and specify a full and incremental crawl schedule if you have not done so yet.

Screenshot: Setting crawl schedules

Test Search

Now that the search indexing is set up and your Confluence content source has been crawled, it is time to test the search from SharePoint. To search content outside of SharePoint, you need to use a MOSS Search site. If you already have a MOSS Search site, simply navigate to the site, enter a search expression and see your results. The Confluence result links should direct you to your Confluence pages. If you don't have a MOSS Search site, follow the steps below.

Navigate to the SharePoint Site that you want to contain your search site. This is typically done as a site off of the root (top-level) site, but can be done from any site.
Choose Site Actions -> Create -> Sites and Workspaces.
Enter the title, description, and URL of your search site. Choose the "Search Center" or "Search Center with Tabs" template under the Enterprise tab (the "Search Center with Tabs" template is only available if the "Office SharePoint Server Publishing Infrastructure" site collection feature is activated).

Screenshot: Creating a SharePoint search site

When search is performed within this site it needs to communicate with Confluence for the search security trimmer. If it cannot, it will not show any Confluence results. If this search site has a parent site that already has Confluence configured, you should be fine. If not, you will need to configure the Confluence settings for the search site (see SharePoint Feature Configuration).

Enable Your Scope

Now that you have a search site, you can configure SharePoint to use custom scopes on the search drop down that shows on all SharePoint pages.

Screenshot: The SharePoint search drop-down

Enable Custom Scopes

The first step in this process is to enable custom scopes.

Go to the top level site settings page for your SharePoint site collection in which you want to have a Confluence scope (Site Actions -> Site Settings). This is not done with SharePoint 3.0 Central Administration. You may see a sub-menu under Site Settings. If so, choose Modify All Site Settings. Within site settings choose "Search settings" under Site Collection Administration.

Screenshot: The top level settings page for a SharePoint site collection

Within Search Settings choose "use custom scopes" and provide the url to the search site that you may have created above (we used used "search" so would enter "/search").

Screenshot: Setting custom search scope

Now if you refresh any page in SharePoint the search drop down should show "All Sites" as an option.

Enable the Scope

After the scope was generated further above you must enable your scope within your site collection.
*Go to the top level site settings page (Site Actions -> Site Settings) and choose "Search scopes" under Site Collection Administration.

Screenshot: The top level settings page for a SharePoint site collection

Click the "Search Dropdown" display group

Screenshot: Selecting a search scope

Check the Confluence scope to enable it in the search dropdown.

Screenshot: Setting scope display options

Fine Tuning Crawl Configuration

Fine Tuning Crawl Configuration

Tuning Crawl

Because the crawling of Confluence content is done by a generic Web Site crawler, there may be additional configuration steps needed for some content to be crawled.

The following steps are likely needed for Confluence attachments to be crawled.

The crawler may fail to crawl some content unless you add a new "action" file type to the list of files that are crawled. This is done by going to the Search Settings within your Shared Service Provider (from SharePoint Central Administration) and clicking on the "File Types" link. From there you can add the "action" file type.

Preventing Confluence from crawling some content

After adding the "action" file type, the crawl can take considerably longer because many more pages are now crawled including administrative pages. To remove these unwanted pages from the crawl log you must set up exclude crawl rules. This is done by going to the Search Settings within your Shared Service Provider (from SharePoint Central Administration) and clicking on the "Crawl Rules" link. From there you need to add the following exclusion:

<Confluence URL>/download/temp/*
<Confluence URL>/spaces/usage/*
<Confluence URL>/admin/*
<Confluence URL>/users/*

To locate the crawl rules configuration, from SharePoint Central Administration:

Click on the shared service provider, likely named "SharedServices1" in the quick launch bar along the left side of the web page.
Click on Search settings.
In the Configure Search Settings page click on Crawl rules.

Note that you already had a crawl rule for your Confluence site that should remain untouched except that the order of these must be set such that the exclude rules are above the include rule. Note also that you do not want to check the "Include complex URLs" checkbox within each of the exclude crawl rules.

After making the above changes you can start another full crawl of your content source.

Ideally we would also exclude "<Confluence URL>/spaces/*", but that appears to prevent the crawling of attachments. This can be a little misleading in that attachments in the demonstration space are crawled even if you have this exclude rule. However, other attachments do not appear to be crawled.

Crawling attachments

If the above changes do not help with crawling attachments, try changing your Confluence content source to use a start address that points to the "/spaces/listattachmentsforspace.action" page. Then start another full crawl of your content source.

This start address should allow for all Confluence content to be crawled and not just attachments. If you feel that is not the case for your environment, you may want to put both your base Confluence URL and the "list attachments for space" URL in the start address field.

Still Having Problems?

If you are still having problems after going through the steps above, check out the Troubleshooting SharePoint Search documentation.

Child pages

SharePoint Search Configuration