Saturday, 27 September 2008

music brain - part 1

I sometimes feel as though I am so focussed on one particular thing (at the moment Arum DataEye) that my various other skills that I am not using so much at the time might become rusty.

As a result I have embarked on a little side project... Music Brain. It has probably been done before, but it is really just a programming exercise for myself more than a serious project.

Problem: The new Metallica album, Death Magentic, was released recently and I want to know how "Metallica" it really it is. My gut feeling has a human is that it's pretty Metallica. :) But I got to wondering if a computer could make the same judgement?

Solution: Train a neural net with my library of Metallica tracks, except for the current album. Then run the new album through the neural net and see if the program agrees that this is a Metallica album. I will also run a few other band's tracks through the neural net after training has been completed and see what it "thinks" about them. Hopefully the program should be able to differentiate between existing Metallica tracks and other bands, while being able to identify new Metallica music.

Approach: Somehow I will have to extract the music from my music library and feed it in to the neural net. How many input nodes will I need? How can I extract the raw music?

Today's first steps were to read up on the WAV file format. It is a pretty straight forward format and using the information on this page I was able to create a "WAVIterator" class which takes an input stream and using the Iterator pattern allows me to pull "MusicSamples" out of a WAV file. In my API a "MusicSample" contains a one or more channels of data for a very small slice of time.

I have placed some restrictions on the files I am going to be able to parse in order to make life easier. Basically the WAV file must have been recored with 16 bits using PCM (Uncompressed) at 44100hz. I figured this is close to CD quality and good enough for the human ear so it would be good enough for this program.

The resulting test program was an interesting and humbling reminder about how slow using an unbuffered FileInputStream really is. My test code was taking nearly 100 seconds to parse the WAV file which was a little disappointing. So I decided to load the whole file in to memory first and use a ByteArrayInputStream instead. Wow! This reduced the parse time to just over 4 seconds, much more like it.

However there are a number of problems with this approach. Firstly, what if the file size is greater that Integer.MAX_VALUE? i.e. larger than the maximum size of a primitive array in Java. This is unlikely to happen since Integer.MAX_VALUE is 2^31 - 1, or 2147483647 (i.e. about 2gb), but that leads on to the next problem, you're soon going to run out of memory if you create 2gb byte arrays. It just isn't good practice to make the assumption that your files will be small enough and/or risk trying to allocate an array that big.

Of course, after a minute's pondering I remembered that there's a class just for this type of occasion; BufferedInputStream. So I replaced 9 or 10 lines of code and wrapped my FileInputStream in a BufferedInputStream and the performance was just as good. It feels good to get back to basics sometimes.

My next tasks will be to work out how to create a neural net in Java with appropriate inputs and outputs for my extracted music data. I don't really want to code a whole neural net, so I will probably use an existing framework... more next time.

No comments: