I'm in my third and final day of MOSS 2007 Enterprise Search Advanced Training. There's a best practice from yesterday's session that I want to discuss. Microsoft recommends that content owners / end users use spaces in filenames to make them searchable (i.e., "MOSS 2007 White Paper" and not "MOSS2007WhitePaper"). The indexer treats a filename without spaces as a single word. Tests during the session proved that filenames using underscores ("MOSS_2007_White_Paper") and dots ("MOSS.2007.White.Paper") also are not indexed as individual words but as a single word. Therefore a search on a keyword that is a part of such a filename ("MOSS") will not show up in search results.
That's significant, but spaces in filenames can cause issues. (Another example. And another.) These may not be SharePoint issues, but it's possible that you'll have compelling reasons not to use spaces in your file names. At the very least, it could be difficult to enforce a file-naming convention across an organization, especially when there's no out-of-the-box validation at the time of upload.
My recommendation is to require the Title field in Sharepoint document libraries, and/or to train content managers on the importance of using this field. Title is very important in ranking and relevance, so much so that MOSS 2007 search uses a text extraction algorithm that generates TWO title values for every MS Office document - the actual title field from the document property sheet, and a "shadow" title based on formatting and placement of the words at the top of the document. If content managers populate the Title field with meaningful words, the formatting of the original filename is not so critical to search results.
Recent Comments