Search the SharePoint Connector 1.0 documentation:
Index
[Downloads (PDF, HTML & XML formats)]
[Other versions]
These steps will configure the SharePoint searching engine to index Confluence pages such that performing a search within a MOSS Search site shows both SharePoint and Confluence results.
Prerequisite: Microsoft Office SharePoint Server (MOSS) 2007 Required
Note that this search capability requires Microsoft Office SharePoint Server (MOSS) 2007 (Standard or Enterprise) and will not work with Windows SharePoint Services (WSS v3). Searching with Microsoft's new Search Server 2008 is likely possible, but is not yet tested. See Search Configuration for Search Server 2008 for more details.
Prerequisite: July 2008 Infrastructure Update, Service Pack 1, or the Forms Authentication Hotfix
MOSS 2007 shipped without the ability to crawl web sites such as Confluence that use Forms Based Authentication (FBA). There are several ways to update MOSS to provide this functionality. Please review the SharePoint Search Prerequisite Updates for the details. If you think you may already have SharePoint updated you can skip the prerequisite updates and will be instructed to install the hotfix later if there is a problem (note that installing the hotfix is not the only solution as discussed in the prerequisite updates).
Troubleshooting
If you are still having problems after going through the steps below, check out the Troubleshooting SharePoint Search documentation.
To perform searching of Confluence, several search entities need to be created. This includes a Content Source, Crawl Rule, Scope, and the registration of a Security Trimmer. All of this is done for you through two Confluence search administration screens described below. You can also do this manually, but this requires a bit more effort.
Run SharePoint Central Administration on the SharePoint server (Start->Administrative Tools-> SharePoint 3.0 Central Administration) and click on the shared service provider, likely named "SharedServices1" in the quick launch bar along the left side of the web page.
Screenshot: Selecting your shared service provider
On the shared services home page click on Confluence Search Settings to get to the Manage Confluence Search page.
Screenshot: The shared services home page
On the Manage Confluence Search page click on New Confluence Search Source to get to the Confluence Search Source page.
Screenshot: Selecting your Confluence search source
Fill in the data on the page.
Screenshot: The search source configuration screen
Warning
Forms Authentication should currently be used even if you have Confluence set up using NTLM. The MOSS crawler has problems crawling Confluence directly under NTLM, even though it can crawl other Windows authenticated web sites just fine. This issue can be tracked as JIRA item CSI-59. If you are running Tomcat under IIS, one option is to have the crawler search using the Tomcat port. Then you will want to set up a Server Name Mapping (under MOSS Search Settings) so that search results show the IIS port instead of the Tomcat port.
Note
Note that if you do use Windows Authentication for the crawl rule, make sure you include the domain as part of the search crawl user name (e.g., "DOMAIN\user").
Once you have created your Confluence Search Source you can do any of the following:
Now that we have a content source and crawl rule, we can crawl the content source
Now that the search indexing is set up and your Confluence content source has been crawled, it is time to test the search from SharePoint. To search content outside of SharePoint, you need to use a MOSS Search site. If you already have a MOSS Search site, simply navigate to the site, enter a search expression and see your results. The Confluence result links should direct you to your Confluence pages. If you don't have a MOSS Search site, follow the steps below.
When search is performed within this site it needs to communicate with Confluence for the search security trimmer. If it cannot, it will not show any Confluence results. If this search site has a parent site that already has Confluence configured, you should be fine. If not, you will need to configure the Confluence settings for the search site (see SharePoint Feature Configuration).
Now that you have a search site, you can configure SharePoint to use custom scopes on the search drop down that shows on all SharePoint pages.
Screenshot: The SharePoint search drop-down
The first step in this process is to enable custom scopes.
After the scope was generated further above you must enable your scope within your site collection.
*Go to the top level site settings page (Site Actions -> Site Settings) and choose "Search scopes" under Site Collection Administration.
Screenshot: The top level settings page for a SharePoint site collection
Because the crawling of Confluence content is done by a generic Web Site crawler, there may be additional configuration steps needed for some content to be crawled.
The following steps are likely needed for Confluence attachments to be crawled.
The crawler may fail to crawl some content unless you add a new "action" file type to the list of files that are crawled. This is done by going to the Search Settings within your Shared Service Provider (from SharePoint Central Administration) and clicking on the "File Types" link. From there you can add the "action" file type.
After adding the "action" file type, the crawl can take considerably longer because many more pages are now crawled including administrative pages. To remove these unwanted pages from the crawl log you must set up exclude crawl rules. This is done by going to the Search Settings within your Shared Service Provider (from SharePoint Central Administration) and clicking on the "Crawl Rules" link. From there you need to add the following exclusion:
To locate the crawl rules configuration, from SharePoint Central Administration:
Configure Search Settings
page click on Crawl rules.Note that you already had a crawl rule for your Confluence site that should remain untouched except that the order of these must be set such that the exclude rules are above the include rule. Note also that you do not want to check the "Include complex URLs" checkbox within each of the exclude crawl rules.
After making the above changes you can start another full crawl of your content source.
Ideally we would also exclude "<Confluence URL>/spaces/*", but that appears to prevent the crawling of attachments. This can be a little misleading in that attachments in the demonstration space are crawled even if you have this exclude rule. However, other attachments do not appear to be crawled.
If the above changes do not help with crawling attachments, try changing your Confluence content source to use a start address that points to the "/spaces/listattachmentsforspace.action" page. Then start another full crawl of your content source.
This start address should allow for all Confluence content to be crawled and not just attachments. If you feel that is not the case for your environment, you may want to put both your base Confluence URL and the "list attachments for space" URL in the start address field.
If you are still having problems after going through the steps above, check out the Troubleshooting SharePoint Search documentation.