10 April 02005

How many CDs are there in the world? Gracenote and metadata

At the end of last year, I tentatively made the prediction that "the catalogue of music recordings readily available in the northern hemisphere will continue to increase by 50% every five years until 02025 when it may start to plateau or saturate". But I can't test this prediction until I have some reliable measure of the catalogue and of how much of it is 'readily available'.

So far I'm drawing a blank even on the simple measure of how many CD titles have so far been issued. Last week Gracenote announced that their CDDB® database for music recognition has been used two billion times to identify CDs. They claim CDDB "contains the largest online database of music information in the world". As of today it has data for 3,598,785 CDs and 46,002,354 songs (note the iTunes Music Store has only 2.5% of these songs available).

Is CDDB a good measure of the total catalogue of CDs? I've heard reports of up to 5% of CDs not being recognised by CDDB — though the only time I've experienced this was with a spoken word CD — which would suggest that CDDB underestimates the total catalogue. However, it also overestimates the number of CDs because the database contains several duplicate entries. I have the six CDs of the Anthology of American Folk Music, edited by Harry Smith on my iPod, with metadata taken from the CDDB. But two of the six CDs appear twice in the database, and one appears three times. You can see this by going to the web interface for the database, and searching on 'Anthology of American Folk Music' (n.b. a fourth volume was released separately from the original three-volume, six-CD set). Try it for other albums as well.

There is no quick way to assess the scale of the underestimates or overestimates in CDDB, which undermines its usefulness as a measure of the CD catalogue.

Another annoyance with CDDB is the inconsistent presentation of data. The metadata I got from CDDB for the six CDs has four different ways of writing the disc titles and numbering the discs in the collection:

  • Anthology of American Folk Music - 1-B Ballads;
  • Anthology of American Folk Music: Ballads 1-A;
  • Anthology Of American Folk Music (Disc 4);
  • Anthology Of American Folk Music, Vol. 3 (Disc 2).

If you're trying to locate a particular disc on a small iPod screen (or even on a 15" computer screen), this inconsistency makes it very difficult to work out which is which, and what order they're supposed to go in.

As I wrote last year, Gracenote is one of the new 'gatekeeper' organisations that has a 'silent' influence on how people discover and locate digital music. CDDB is used by AOL, Apple, Philips and Sony in their digital music offerings, and may establish itself as pre-eminent in the market through the network effect, which would have worrying implications if it doesn't clean up its data.

Happily there is a 'community' music metadatabase alternative to CDDB, in the shape of MusicBrainz, which — on my very small sample — seems more reliable: my search for 'Anthology of American Folk Music' found no duplicates and only one disc title in a slightly different format from the others.

But MusicBrainz cannot currently be a measure for the total CD catalogue, since, at the time of writing, it claims only 252,602 CDs and 3,102,305 tracks (compared with CDDB's 3.6 million and 46 million respectively).

Posted by David Jennings in section(s) Curatorial, Future of Music, Long Now, Music and Multimedia on 10 April 02005 | TrackBack

This article is fascinating.

Gracenote certainly is not a trustworthy source of music market information. Also, I certainly do not want them handling any of my personal information, such as what is in my music collection.

Musicbrainz and FreeDB are nice and seem to be more trustworthy. I also enjoy allmusic.com; the data is richer and more comprehensive, by far, than anything I have accessed on the internet.

Posted by: Don R. on 26 May 02005 at 11:36 AM

Thanks for the kind comment, Don.

A few weeks after I wrote this I bumped into Rob Kaye, the main guy behind Musicbrainz, while he was in Europe. Quite rightly he called me on the point that I'd spotted an error in the Musicbrainz database and hadn't logged in to correct it. I have now. You can't do that with CDDB.

Posted by: David Jennings on 26 May 02005 at 12:01 PM

gracenote usually comes on automaticly when i put music into my walkman, but not always... why is that??

Posted by: roy roska on 31 October 02006 at 9:49 AM

Roy, If it's an intermittent fault, I suggest you direct your question to Sony.

Posted by: David Jennings on 31 October 02006 at 10:01 AM
Post a comment

Remember personal info?