What makes Bach sound like Bach? New dataset teaches algorithms classical music

savanaly · on Dec 5, 2016

The question of what makes Bach sound like Bach is, needless to say, not addressed.

The actual thing they're reporting is:

'“You need to be able to say from 3 seconds and 50 milliseconds to 78 milliseconds, this instrument is playing an A. But that’s impractical or impossible for even an expert musician to track with that degree of accuracy.”

The UW research team overcame that challenge by applying a technique called dynamic time warping — which aligns similar content happening at different speeds — to classical music performances. This allowed them to synch a real performance, such as Beethoven’s ‘Serioso’ string quartet, to a synthesized version of the same piece that already contained the desired musical notations and scoring in digital form.

Time warping and mapping that digital scoring back onto the original performance yields the precise timing and details of individual notes that make it easier for machine learning algorithms to learn from musical data.'

It also mentions that they attempted to apply existing deep learning algorithms designed for speech recognition to their new dataset, hoping to be able to accomplish a task such as predict a single missing note from a long string of notes. It does not say whether this worked.

jthickstun · on Dec 5, 2016

Hi savanaly,

When we talk about "what makes Bach sound like Bach," the technical concept we have in mind is the recent work in computer vision on style transfer. For example,

https://arxiv.org/abs/1508.06576

We are excited to work on adapting these models to the musical domain!

As for note prediction, you can see our results in our paper:

https://arxiv.org/abs/1611.09827

Our are results for simple (2-layer, not very "deep" models); we were interested in understanding the low-level "features" of music rather than building a model that maximizes performance. Nevertheless, the results are quite promising; I'm confident that someone using our dataset with a deep network and a lot of gpus could blow our numbers out of the water! :)

Tutorials on how to set up and evaluate this task are available on our website:

http://homes.cs.washington.edu/~thickstn/start.html

dharma1 · on Dec 5, 2016

I'm an (ex-)musician playing with machine learning, this is very interesting, will check it out! Kudos for curating the dataset. So your goal initially is to basically build polyphonic transcription with CNN's?

I am starting to record my own dataset for solo jazz piano - all midi though. Monophonic melodies, and matching chord voicings and voice leading from one chord to the next. With the goal of learning to generate a good sounding jazz piano arrangement to a given melody with nothing except monophonic input.

Style transfer is good at essentially texture transfer - I suspect it won't work that well for understanding music theory (or text), especially with long time series dependencies, but will be very curious to see what emerges.

I'd like to hear more generative music samples from DeepMind's WaveNet too, the piano samples they published sounded very good, but it was unclear what the model had learned or generalised - and how much was semi-randomised recall. I haven't seen the open source implementations of WaveNet produce as good results yet - probably because it's computationally very expensive to train and run, and that limits experimentation. I saw Aäron give a talk on it a couple of weeks ago which helped me understand the stacked dilated convolutions - but would still like to hear more music examples :)

jthickstun · on Dec 6, 2016

Yes, we're starting with the transcription task. CNNs for local prediction are interesting, and we're also curious about capturing the temporal structure of music with something recurrent. It seems like a time series model that understands something about western music should help with music transcription just like language models help with speech transcription.

The style transfer stuff comes later and as you observe, we'll probably need some new ideas to make that work well. I haven't thought about this deeply yet, but my intuition is that maybe instrumental timbre is an audio analog of visual texture, so maybe a reasonably direct "port" of style-transfer to the audio domain would let us construct demos that, for example, rewrite a cello recording to sound like trombone.

Let us know when your dataset is complete! I love jazz.

ktRolster · on Dec 5, 2016

Albert Schweitzer pointed out (in https://www.amazon.com/dp/0486216314 ) that in many cases it's hard to understand Bach's music without understanding the lyrics. The mood of the music will change from cheerful to somber (or whatever) seemingly randomly, but if you understand the lyrics, it's not random.

gumby · on Dec 5, 2016

Schweitzer may have been a thoughtful guy but I think this case is a reach; after all the vast majority of Bach's work had no libretto at all (including for example all the organ fugues, and the Goldberg variations and Brandenburg concertos).

However a counterexample from my own life: I only learned German starting in my late 20s and when, 10 years later I heard the Matthäus-Passion and could understand the lyrics I wept...and I don't even know much about christianity.

Bud · on Dec 5, 2016

Professional Bach singer here: member of American Bach Soloists, Philharmonia Baroque Orchestra, Bach Collegium San Diego, Carmel Bach Festival.

It's actually not true that the "vast majority" of Bach's work had no texts. Bach wrote over 200 cantatas (with around 5-10 movements and separate texts each) plus an assortment of masses and 4 extremely large choral/orchestral works: the St. Matthew and St. John Passions, the Christmas Oratorio, and the B Minor Mass.

Looking at the catalog of all of Bach's works, which I have here, BWV (Bach Werke-Verzeichnis) numbers 1 thru 1071, you get all the way to BWV 525 before you even get out of the vocal works. Numbers 1 thru 524 are all cantatas, masses, oratorios, lieder, the many chorales of course, secular and comic cantatas, etc. And of course many/most of these are far larger than the individual organ works.

(Bach actually wrote a lot more cantatas than this; but only around 224 of them survived. Another hundred or so were lost.)

gumby · on Dec 5, 2016

Thanks for correcting my hyperbole, which I suppose is due to my bias for his keyboard works which I like to play (at home -- I doubt anyone suffer through my playing). I do like the masses and passions though, so you are spurring me to listen to more choral work!

Bud · on Dec 5, 2016

Totally check out the cantatas; many of us feel that the cantatas are the real heart of Bach's work. Bach was fundamentally a church musician.

sumpmonster · on Dec 5, 2016

If you have an iPhone or iPad I would reommend the Bach cantatas app to learn about Bachs cantatas:

http://www.cantatasapp.com and https://appsto.re/de/HecH5.i

gaur · on Dec 6, 2016

How many of the lost hundred cantatas do you think contained new (unknown to us) music, as opposed to music that was parodied in a surviving cantata?

ktRolster · on Dec 6, 2016

Most of them, probably

devindotcom · on Dec 5, 2016

This is my understanding too, his choral works are many. They were part of the Bach family factory though, IIRC, so might not be the best data points for learning what makes Bach Bach note by note.

Bud · on Dec 5, 2016

Not so; the cantatas are authentic J.S. stuff.

ktRolster · on Dec 5, 2016

I heard the Matthäus-Passion and could understand the lyrics I wept

Which parts made you weep?

gumby · on Dec 5, 2016

So I didn't really know the Jesus story, just that it was when one of their gods was killed. But the text makes it a human event in which his friends suffer.

It even begins with him being a bit of a prig as his acolytes express their unhappiness. But then he himself expresses pathos (Meine Seele ist betr ̧bt bis an den Tod, bleibet hie und wachet mit mir.) The story becomes unfair. I really felt the unfairness of his trial.

It's a much more sympathetic story than, say, Plato's recitation of the death of Socrates, who comes off as a jerk.

Bud · on Dec 5, 2016

I would make a list, but it'd be about 25 items long. ;)

Start with "Eli, Eli, lama sabachtani" and "Wahrlich, dieser ist Gottes Sohn gewesen" and "Und von der sechsten Stunde an ward eine Finsternis über das ganze Land bis zu der neunten Stunde" and...see? I can't stop once I get started.

irrational · on Dec 5, 2016

Except "Eli, Eli, lama sabachtani" are Greek transliterations of Hebrew/Aramaic so... what does that sound like to you in German?

stan_rogers · on Dec 5, 2016

That's translated inline in the original Greek, and in every subsequent translation. One does not need to understand the Aramaic, just to be familiar with the verse containing it.

haberman · on Dec 5, 2016

This is really interesting, but it confused me at first. Given the title and the problem posed in the article's intro, I figured this would be a dataset of sheet music, ie. the notes and durations specified in some printed music. However, reading more it appears to be focused on recordings (ie. audio) and annotating those recordings with information about where each note starts and stops.

So to me this seems more directly applicable to transcription (ie. taking audio and turning it into sheet music) or synthesis (taking sheet music and turning it into audio of a human-sounding performance) than it does to composition or finishing unfinished works by famous composers. The output of the compositional process is generally sheet music, not audio, so it seems to make more sense that problems around composition would be trained and learn in the sheet music domain.

I'm not a machine learning researcher though! This is just my impression as a musician.

jthickstun · on Dec 5, 2016

Hi haberman,

I'm one of the authors on this paper. You're right that the most direct applications of this dataset are transcription and synthesis. One of the cool aspects of end-to-end learning models is that they discover a "representation" of data that can be useful when applied to other tasks. We speculate about some tasks like recommendation and composition on our website:

http://homes.cs.washington.edu/~thickstn/musicnet.html

We're also interested in music like jazz and pop, for which good scores are often unavailable. Classical music is nice for training models because we can use sheet music as labels to learn a representation. Many aspects of this representation, such as rhythm and harmony, may transfer to other musical genres. Learning about classical recordings could bootstrap learning for other kinds of musical audio.

So while you're right that it's probably easier to learn a model to complete Bach using symbolic sheet music, we feel that addressing complex tasks directly from raw audio is worthwhile!

haberman · on Dec 5, 2016

This news reminded me of this gem, from back in the day (I think 1996-ish):

http://www.markheadrick.com/midi/absmfaq.txt

In section 1.4 they very emphatically state that "with current technology, IT CAN'T BE DONE."

They conclude: "Think of it this way: If you don't mind spending more than the US national debt on computer equipment and waiting a few years for the job to complete, you can have a system that MIGHT accurately convert the digital waveform data of a 5 minute song into a small, compact MIDI file.

Otherwise, you can blow a couple of thousand dollars hiring a professional band of studio musicians and engineers who can probably give you what you want in about one day."

It is humorous for its emphatic-ness, but also educational for being a picture into how we've historically thought about this problem.

pierrec · on Dec 5, 2016

This announces a new dataset where recorded performances are precisely synchronized to MIDI transcriptions. Obviously the article doesn't seem to get the implications quite right (it's very useful for performance-related research, not so much for AI composition).

As a composer, the coolest potential I see here is training a model to create realistic mockups from MIDI compositions. For that purpose, though, it would be better to start with a fully monophonic/solo-instrument dataset, which would simplify the learning. Also, MIDI data is not entirely sufficient: annotations on dynamics and playing technique would be necessary to make a good mockup tool, since this is the kind of information one might even give to human performers.

Anyways, it would be tough for such a tool to catch up with current state-of-the-art, sample-based mockup tools, which are already baffling in their realism, although they usually require a lot of work to get good results. But one can always dream of a "Stokowski" or "Karajan" neural network that interprets your MIDI composition with emotion and sensibility!

mrcactu5 · on Dec 5, 2016

i ran into a few issues trying to study classical music with a computer. first of all is merely putting in some representation of the musical score into a file. This was accomplished by MIDI but I am hoping for a more standard way that looks more like the notes of a score.

Another problem is once you have the music there's a tremendous amount of "interpretation" that a musician does. the nodes may each read 1/8 but a musician might add or subtract 1/64 has he/she feels is good.

other times the change is more mathematical 1/8+1/8+1/8 might have to actually be read 1/12+1/12+1/12 = 1/4 but that is much easier to fit into a computer

I have said nothing of dynamics (loud/quiet), articulation (stoccatto, slurring etc).

scores are available in IMSLP and other sources. but are computer files available as well?

ktRolster · on Dec 5, 2016

You might look at GNU Lillypond, which might be the type of representation you are looking for, since it can be made to look like the notes of a score.

Here are some collections of music in that format:

http://www.mutopiaproject.org/

https://github.com/trending/lilypond

gattilorenz · on Dec 6, 2016

It is interesting to note that in 1990 there was an expert system composed of a myriad of handmade rules that could produce Bach-like harmonizations.

http://www.global-supercomputing.com/people/kemal.ebcioglu/p...

Unfortunately I can't seem to find the samples now, but to my (untrained) ear they sounded as Bach as the real thing.

Gaussian · on Dec 5, 2016

Professor David Cope of UCSC has done extensive work in this space, starting with his EMI algorithm. His algos + DBs have created some in incredible music in the Bach style.

lalos · on Dec 5, 2016

Relevant project to train a model that generates Bach music

http://bachbot.com/