2024-10-25
? views

My first dive into machine learning

Something that I had wanted to tick off my programming bucket list for a while was to learn about machine learning and neural networks. I had originally been inspired by YouTuber Cary Huang who has created some amazing videos on the topic, using animated characters to explain the concepts in a fun and engaging way. I wanted to try it for myself, but I quickly realised machine learning is considerably harder than it looks. I was also very young at the time. However…

Wekinator

My professor recently introduced me to a piece of software called Wekinator. It allows you to create a neural network without writing any code, and interestingly it sends and receives data over OSC. While I would rather do it all in code, admittedly the GUI does make it very easy to use and it comes with the bonus of easily being able to communicate with Max.

A neural network in some way is like an advanced interpolation tool. You train it with a set of inputs and their expected outputs, and after training it will try to predict new outputs based on new inputs. The more data you train it on, the more accurate the predictions will be. Like a human brain, it will slowly learn patterns and relationships between the data. Wekinator is a great tool for this as it allows you to train the network in real-time, and you can see the results immediately.

My first project

To get started, I decided to get Wekinator to guess what chord I was playing with a synth. I used the Ableton synth ‘Operator’ and sent the audio’s chromagram (a 12-element vector representing the presence of each pitch class) to Wekinator using Max. Wekinator was ready to receive 12 values and output a classifier from 1-3 for the chord it thinks it is hearing. 1 was C major, 2 was F and 3 was G. Here’s how I set it up:

Wekinator

Max patch

While this screenshot was taken, I was playing middle C on a piano. To calculate the chromagram, I used FluCoMa’s fluid.chroma. We can send the outputted values to Wekinator (see the right of the patch) and train it to recognise the chords. I trained it with a few different chords and it was able to predict the one I was playing with a high degree of accuracy. I was very impressed.

Something a little more complex

At the end of the day, Wekinator takes numbers and spits out more numbers. Therefore it can be used for virtually anything, so let’s take it a step further.

Courtesy of Ben E. C. Boyter, I found a dataset of letters of the alphabet. Each letter in this dataset has a 20x20 image for both upper and lowercase in up to 200 fonts.

Alphabet dataset

I wrote a Python script to compile a huge JSON file with all of the data (around 80MB!), representing each letter as a set of 400-element vectors. I then wrote another Python script to send all this data to Wekinator, setting it up to accept 400 inputs (for each pixel) and send out a classifier with 52 values (26 letters, upper and lowercase).

After a few headaches from dealing with so much data, Wekinator was officially trained on 10009 images, or ’examples.’ To test it, I drew some letters in Microsoft Paint and lazily read the values using a udpreceive in Max. Wekinator was able to predict the letters I drew with varying levels of success!

Here’s an example of me drawing a lowercase ‘h’ and Wekinator predicting it correctly (h is the 8th letter of the alphabet):

Photoshop ‘h’

Wekinator predicting ‘h’

A nicer user interface

That method of using Paint and Max was hard to use and ugly, so I came back to this the next day to make it a little more user-friendly. I painstakingly made a basic web app using ‘raw’ HTML and JavaScript, communicating through a program called osc-web to send/receive the appropriate values from Wekinator.

The user is presented with a 20x20 grid to draw on, and every second the values of each pixel are sent to Wekinator. It then sends back the classification (a number from 1-52) and the app displays the letter it thinks you drew. It’s not perfect, but it’s a fun little project and it works well enough as a proof-of-concept.

Letters ‘A’, ‘B’, ‘Q’, and ‘k’ being correctly detected

Thanks for reading. More to come soon.