Online movie-rental company Netflix announced yesterday that it is offering a million-dollar prize to anyone who can improve their recommendation system. In an interview on NPR yesterday, James Bennett, vice president for recommendation systems at Netflix, suggested that the issue is not so much with the existing tool, but with the metadata associated with movies in their database. He said that one way to improve the system could be to offer a questionnaire that a user could fill out to improve their recommendations, with such questions as "Do you like movies with happy endings?" However, in such a system, if the user answers yes to that question, the application must then find movies that have happy endings, and there is currently no data to support this. What Netflix knows about movies is who made them and who's in them, some basic genre and ratings information, and any associated awards. This is a good start, but it's not enough to provide the subtlety of recommendations that users want, and it will be a gigantic job to re-tag the database of movies if someone develops a better solution.
This is why, as you design your corporate portal, document management system, or any other database particular to your company or industry, it's not enough to go with the the application's default metadata, or with What You've Always Done. You need to think carefully about the metadata you are collecting on, say, a repository of documents. How might your employees want to use them, not just now, but five, ten, twenty years from now? Having worked extensively with Microsoft's SharePoint Portal Server 2003 (now Microsoft Office Sharepoint Server, or MOSS 2007), I know how easy it is to start storing documents and information without building in any useful metadata at all. The tool won't do it for you - you need to put the process and the standards in place first.
To cite just one common example - suppose your firm wants to help the younger and newer employees by showing who at the company has expertise in particular areas. How do you define expertise? One way might be to see how many documents an individual has authored on a subject. Sharepoint Portal Server automatically tags every uploaded document with "Created By," which is the login name of the uploader. But what if the uploader is the expert's secretary? The new employee searching on the subject will find a lot of documents "by" that secretary. It might be possible to infer who the real author is, but what if the secretary works for more than one person? By the time you decide you need to include an "author" field on all document libraries, there might be hundreds or thousands of documents already in the system. Who's going to go back and enter the author information for all of them? But if you make the "author" field required only for uploaded documents going forward, you won't be able to capture all the good work that's already been done.
It can't be stressed enough - plan and strategize how to handle metadata BEFORE you implement a new system, especially one like SPS or MOSS.
Now here's my suggestion for Netflix. The problem is, if I say I loved a certain action film, Netflix recommends a bunch of other action films in a similar time period and style. But maybe I loved it because of clever dialogue or an exotic setting, and not because of its genre or actors. How about this: after I rate it four stars, give me a standard list of choices for "Why?" The plot, the stars, the special effects, the script, etc. The choices could even be multi-select or rankable. Do I actually have to code it to get the million bucks?