30 November 02005

A cure for messy music metadata?

Gracenote has added over 650,000 CDs to its database in the seven and a half months since I last checked. That's quite a lot, and unfortunately it seems likely that there a significant number of duplicate records among them — cases where the same CD appears with the title or artist name written in a slightly different format. I noted before that the Gracenote database used four different ways of writing the titles of the six CDs in the Anthology of American Folk Music collection — now there seem to be one or two extra ways on top of the original four.

These inconsistencies create annoying problems for people trying to find particular albums or tracks on their MP3 players, as noted in this Wired article by Dan Goodin. His solution is to get tag editor software and sort out the metadata formats the way you want them, on your own. But surely there should be a less labour-intensive option?

Instead of tens of thousands of listeners re-editing the same tags and storing them privately, shouldn't there be some means of comparing the 'standard' tags on your local version of a file (artist, album and track name, date of release etc) with a clean, standardised and well-reviewed database — MusicBrainz being an obvious example — and then giving you the option of syncing your local tags with the standard format? After which you could add your own personal tags, or tweak all instances of a particular tag on your local version if you wanted to.

By keeping its database proprietary, Gracenote ensures that its greatest asset also becomes its greatest millstone and drain on its resources: they do not seem to be able to maintain the database to anything near reasonably quality. And listeners' experience of digital music suffers as a result. I levelled a similar kind of charge that at the Music Genome Project in my last post.

My point is not ideologically driven; it's simple pragmatics and economics of effort. If you can get amateurs to research, interrogate, review and improve your data for free, you'd be a fool not to. Major works like the Oxford English Dictionary relied on such volunteer labour, and would never have been completed without it.

In that case, the volunteer effort did not even stop Oxford University Press from owning the exclusive rights to the finished product.

In the case of music metadata, there is extra incentive for volunteers to contribute their efforts to a shared database and shared standards, because their is clear pay-off: their investment of time is lower, and the resulting quality of data they can use is higher.

There is a tactical skill in positioning a media database in the right part(s) of the private-collective spectrum to get the incentive/ownership equations well-tuned. And getting the incentives right is crucial to sustainability.

Finally, for anyone who's 'keeping score' like I am, in the same period that Gracenote added 650,000+ CDs to its database, Musicbrainz added 82,000+ albums to its database.

