Chromaprint 0.2

I’m not good at releasing code, as this should have been done a long time ago, but I’ve finally released the version 0.2 of Chromaprint. The main changes are new functions in the public API for working with raw fingerprints and safe releasing of memory allocated internally in Chromaprint.

How does Chromaprint work?

I’ve been meaning to write this post for a long time, but never really finished it. I hope it will help people understand how does the Chromaprint algorithm work, where do the individual ideas come from and what do the fingerprints really represent. It’s not meant to be a detailed description, just the basics to get the general idea.

Acoustid database dump available

I’ve finally written a script to take a consistent dump of the Acoustid database in the PostgreSQL tab-separated format used by the COPY command. I do not have any tools for importing it into PostgreSQL, so it has to be done manually by running SQL commands, but if anybody is interested in playing with the database, you can download it here (2.7G after compression using bzip2, 6.4G uncompressed). The data is licensed under a Creative Commons BY-SA 3.0 License. I’ll add a cron job to export the database at the beginning of every month.

Acoustid submission API extended to accept PUIDs

As strange as it might sound, the Acoustid submission API can now also accept PUIDs instead of MBIDs. I had the idea of using PUIDs to help bootstrap the Acoustid database for a long time, but I avoided implementing it, because I was afraid of bringing all the PUID↔MBID matching errors to the database. The topic came up yesterday and I realized that with having the audio fingerprints, MBIDs and MusicBrainz metadata in the same database, I can pretty easily remove any suspicious matches. So I’ve made two changes to the submission API that should make importing of untagged audio files easier:

Chromaprint plug-in for GStreamer

It’s been a long time since I learned some new framework (or even wrote some real code), so I decided to write a GStreamer plug-in that will wrap libchromaprint and make it very easy to generate Chromaprint fingerprints in GStreamer applications. This was inspired by a similar plug-in that Milosz Derezynski wrote for MusicDNS/libofa (the plug-in is now integrated in the official GStreamer distribution). Using the code from gst-ofa and gst-template, it turned out to be pretty easy. After a couple of hours, I was able to run commands like this and get valid fingerprints:

Acoustid moved to a new server

Since I announced the Acoustid project, I got over 1.1 million fingerprint submissions (mostly from MusicBrainz editors), covering about 580 thousand unique MusicBrainz track IDs. In the background I was running a process that imported the raw submissions and merged similar tracks. All this was done on a virtual machine with 1GB of RAM. It wasn’t very fast, but I was surprised it was even able to handle such amount of data. At some point I also enabled the lookup service, but I didn’t want to announce it, because it took too long to do fingerprint searches on the server.

Binary fingerprint compression

While working on the Acoustid web service, I had a hard time deciding how to send fingerprints to the server. The fingerprints are vectors of fairly large 32-bit numbers. Sending the numbers in binary (which would be ideal) is not easy, because almost all web standards expect textual data, if not plain ASCII. The usual trick is to base64-encode binary data, but that increases the size by 33 %, which wasn’t acceptable for me. So I came up with the idea to compress the data using a special-purpose algorithm and then base64-encode the compressed data, and ideally also compress the base64-encoded data using GZip. The double-compression might seem weird, but it allows me to use only standard web tools and the resulting size is still smaller than using binary encoding.

Acoustid Fingerprinter 0.1 released

After posting about the beta version of Acoustid Fingerprinter, some people successfully used it to submit fingerprints and I also started using it as the main tool for submitting fingerprints, so I think it’s time for a proper release. More details on the wiki:

MusicBrainz database replication

I wanted to work on the Acoustid lookup web service and for this I needed to setup a MusicBrainz replicated database slave. Normally I’d use the Perl code from mb_server, but I didn’t want to mess up the setup I have on the machine hosting acoustid.org with CPAN packages. I had a plan to write a non-Perl version of the replicated code earlier, but only yesterday I really needed it. It turned out to be easier than I expected, so I already have a working version that I use to update my local MB database. Get it from GitHub, if you would like to give it a try.