Sunday, April 9, 2017

Turning Pictures Into Music

Some people use their spring break to go on vacation to New York or publish a science paper. I chose to spend mine making some bullshit on my computer and resurrecting my blog that I haven't posted on in months.

Anyways I’m back with some more nonsense. And it’s some unusually big-time nonsense. No neuroscience here, not really data science, you won't be inspired. There is literally no real purpose for this project except that we thought it would be cool.

To fully experience this post, you'll need to be able to have sound playing out of your device and be without fear of being judged by those around you.
Also I'm using this weird video format so you'll have to turn up the volume on the sketchy videos.

Image result for days since last nonsense

TLDR (too long, didn't read):

My friend and fellow graduate student, Vivek and I made a little algorithm that turns pictures into songs. For instance, I can give our algorithm a picture like this handsome logo for the Northwestern University Interdepartmental Neuroscience Program:

and it'll make a nice little song like this:


Yup. That's about it. Dozens of hours of work...

If you're still with me, this post will explain how we came up with this idea, show you some weird youtube videos, provide some examples of what our thing can do, then explain how it works. If all goes as planned, this should all amount to a substantial waste of time for both you and me. I am confident that we can achieve this goal.


A few weeks ago I was talking with Vivek about levels of meaning in music. When you listen to a song you can just enjoy the nice sounds but it’s so much cooler when you know something about what’s going on behind the curtain. If you know about music theory, or the artist who wrote the music, or the context that a song was written or performed in, it just makes the experience a little richer. For example, when you learn something like how Dark Side of the Moon synchronizes with The Wizard of Oz, or that Closing Time by Semisonic is not about a bar but about the singer's son’s birth - it just adds this whole new layer of meaning to a song that you already liked. And we thought that was an interesting concept.
I stumbled onto some really interesting youtube videos that demonstrate this idea of embedding meaning into a song in this totally new and fascinating way. In the following videos, the artists have put visual pictures into their music.

This group engineered their music so that when it is played through an oscilloscope it makes pictures that go along with the music (~3:20 is especially cool IMO):

Shrooms by Jerobeam Fenderson

So what you’re seeing in this video is literally the waveform of what is being played out of your speakers. Most songs would look like indiscernible noise if played through an oscilloscope, but they made this music with the purpose of the actual waveform being interpretable visually. I am kept perpetually amazed by the things that engineers can do. Details here:

Here’s another one where the artist inserted a picture of his face into the spectrogram of the music. And music has a very loose definition here. I don't find this 'song' to be particularly musical. But the important thing here is that he somehow engineered the song to play a series of tones at the specific amplitudes and frequencies in the middle of the song that made this picture (Skip to 5:30 for the face):

Equation by Aphex Twin

Also I just now remembered seeing this super weird piece of sheet music that someone left in the jazz band room in high school:

It's kind of the same idea of embedding visual aspects in music. Not super relevant, but whatever. It's neat and worth sharing. Don't yuck my yum.
(Info here:

Then while I was working super hard in lab and not browsing reddit I found this meme where people were making midi files out of pictures. In other words, people will take a simple line image and turn the lines into notes to put into music software like GarageBand, where the horizontal spacing is the timing of the notes and the vertical spacing is the pitch of the notes.

Drawing pictures with music - Andrew Huang

To do this, the guy in this video printed out the image he wanted, manually traced it by hand into the music production software on his computer and adjusted the notes to make it sound nice. As people who understand a small amount about computers and music theory, Vivek and I thought this was cool but realized that we could do this ~algorithmically~ because what the world needs right now is an algorithm that saves the precious time of artists who make pictures of unicorns into music. So we chose to bear this great burden. You're welcome world.


We did it.

And not only can we process an image into a song, we can tweak features of the image to make it sound a little nicer. For instance, this is what the NUIN logo sounds like if we don't edit it:


This sounds pretty bad. But you can improve bad things and make them better, like me at doing grad school. In the image above, we're playing so many notes at a time and there is no organization to the tonal quality so it sounds like shit. It's like pressing random adjacent chunks of keys on a piano erratically over time. But we can fix that.

Here's what the NUIN logo sounds like if we shift all of the notes to notes in D# major and play them with a more forgiving instrument:


And we can add chord changes so that for every 16 beats, we switch to a new key in pattern. Dare I say that this sounds almost kind of good?


Here are some Examples:

Lax in C# Major.

Doge Blues (I tried running this on a a picture that wasn't a line drawing. It wasn't great at getting the important features in the image but sounds pretty cool.)


Northwestern Wildcats Electronic Odyssey.
Powerhouse of the cell for bass

Poop in C Major for Trumpet Ensemble


And this:


Also this:

OK that's enough of that. 

Let me know if there are any images that you're dying to hear and I will consider making it for you! (or just use cool open source algorithm we made)

(All code was written in Python and can be downloaded from github at

This project seemed complicated at first but we made it easier by splitting it into just a few discrete, not-too-hard parts:

1) Get the important parts out of an image and turn it into a matrix.
2) Manipulate the matrix to clean up the picture and make the corresponding notes sound nice.
3) Turn the matrix into a midi file that can be played by a music program.

1) It is easy to take an image and turn it into a 3d matrix of RGB values for each pixel using the sk-image library. But you can imagine that if we 'played' an image like this it would sound totally noisy and crazy because there would be values at every pixel that would correspond to notes. We chose to filter the image for any prominent edges that it has and only include these in the matrix. We did this using a Sobel filter. I don't know how it works, but it does work and it works pretty well. This outputs a 2d matrix where edge intensity is coded for by a value between 0 and 1. So faint edges play softer notes, like the second half of the brain in the NUIN logo and the second N in NUIN. Also it had a little trouble with the U for some reason. We also built some parameter tuning functions into our code so that we could threshold which edges to include. I had to play with this a lot to get the doge to work. That was time well spent.

2) The next challenge was to take a matrix and manipulate it such that it maintains the visual picture, but sounds a little bit better. First we built simple functions to resize and pad the x and y dimensions of the image using scipy.signal.resample. In this way we could choose how many beats we wanted the image to play for and the range of notes that would be used to play it. The more interesting methods we made could determine the starting note and the key of the song and these could be called repeatedly to make chord changes.  

3) Lastly, we found a library that writes notes to MIDI files - a file format that is pretty much a piece of sheet music for an electronic instrument (or soundfont) to play. We could just use this function as we iterated through the picture/music matrix that we made and dump it into a midi file with a function that we wrote. Then we just dropped the MIDI into GarageBand to play the 'song.' There's probably a fancier way to programmatically make videos of the MIDI being played but whatever. We tried to do this and we started to write some fancier algorithms to change more things about the picture and music but then we realized that we had spent way, way, too much time on this 'project' and it was time to stop.

All of our code is written in Python and can be found here:

Feel free to use any of it but if you somehow find a way to make money with this then please give us some.

What did we learn?
Absolutely nothing. But I hope that you had fun and that someone you want to impress didn't see you listening to the song made by the poop emoji.

Since I posted this, a few people have pointed out other media that is related: This is a piece called Pictures at An Exhibition written by a Russian composer named Mussorgsky. He wrote this song after being inspired by an art exhibit. There are several movements in this work that each represent a work of art from the exhibit. (Thanks Sean McWeeny!)

The results of this algorithm have been likened to the work of La Monte Young, who has been called one of the greatest living composers today and who's work called into question the nature and definition of music. (Thanks Claire Chambers)

No comments:

Post a Comment