5 min read

Transcribing Speech Melodies

I’ve recently completed a “twitteropera” based almost entirely on speech melodies, and thought it might be worthwhile documenting some of the techniques that I’ve used.

Bitmaps and Vectors

There are a number of approaches one could take when it comes to transcribing speech melodies. Peter Ablinger, in his Words and Music for example, notates them against a fixed background grid – much the same way in which artists of old made drawings.

Albrecht Dürer – Man Drawing a Lute, 1523

Rather than this somewhat photographic approach, I’ve been interested in tapping into the underlying rhythmic pulse of the speech in question and using that as a point of departure for the transcription grid. Since I’ve been making the transcriptions as the basis for music to be performed by musicians, I’ve been motivated to find a method that enables them to feel the pulse from within, rather than following a click track or abstract pulse not related to the material.

Transcription Workflows in Ableton Live

The first step is recording or importing audio of the speech one wishes to transcribe. In the case of my twitteropera I collected material from a variety of sources, ranging from videos on the web to the text-to-speech voices on my Mac. Soundflower has done the job of routing those various sound sources into Ableton, although I wish I’d discovered Rogue Amoeba’s Audio Highjack earlier in the process as it would have significantly simplified and speeded up my workflow.

In this example I’ve selected a snippet of text from Maciej Ceglowskis’s XOXO talk Thoreau 2.0.

On first import the file looks something like this:

First Import

My next step is to find a tempo that comes somewhat close the the basic tempo of the speech fragment. Speech typically fluctuates in tempo at least a little (and sometimes dramatically) but it helps to find a rough starting point nevertheless. A quick glance at the main peaks in the sound-file (in conjunction with listening to the audio) can help determine where the main accents lie. The master tempo can then be adjusted so that those peaks fall somewhat near the vertical tempo grid lines in Ableton Live. In this example “time is limited” halfway through the 3rd measure followed by “but” and “you can’t be” in the 4th, provide some strong markers. At this stage I’m not too concerned that “time is limited” starts a little early, nor am I worrying too much about the time signature or a possible pick up measure – those details can be adjusted later.

Basic Tempo

The next step is to open the clip window and warp the sample so that the tempo can be adjusted without affecting the pitch. With ‘Warp’ activated Ableton places small grey markers where it analyses the main transients to be. Double-clicking on one of these creates a yellow marker which can then be dragged to the left or right in order to pin it to the specific location on the timeline one desires.

Warp Transients

In this case I’ve dragged the main accents to the closest grid line – Ableton can do this for you automatically, but I’ve chosen to make the adjustments manually. “Time” in the middle of measure 3 now falls exactly on the beat.

Adjusted Markers

The timing of the speech has now been altered from the original to fit more closely with the grid structure. We can however return to the original timing while maintaining the underlying beat structure by clicking on the “Slave” button in the clip view – thereby setting the clip as the tempo “Master”. Opening up the master track and selecting song tempo should now show a greyed out tempo automation line showing the changes to the overall tempo that enable playing back the clip with its original timing. Right/context-clicking on the tempo automation window enables the option to “Unslave the Tempo Automation”, the colour of which will now have been changed to red:


The important thing is that by coordinating the accents of the language with the grid structure of the clip we have a better basis for transcribing the pitches of the speech melody. I’ve also added a pick-up measure and some time signature changes in order to fit in with the accents of the speech more closely. In the old days I would transcribe the pitches by hand and place the MIDI notes accordingly. Since Ableton 9 this process has been made a lot easier with built in sound-to-MIDI functionality. Context-clicking the audio file now brings up a menu from which one can select “Convert Harmony/Melody/Drums to New MIDI Track.” In this case “Convert Melody to New MIDI Track” makes the most sense but “Convert Harmony” can also give some interesting results. Although Ableton does a pretty good job of making a basic transcription it’s normally necessary to do a little manual cleaning up.


With a basic MIDI file now at hand the real work begins: filtering, simplifications, orchestration, dynamics. Certain parts of the tempo automation might be removed or smoothed out (making adjustments by ear) or a single flat tempo could be chosen for the entire file – adjustments that wouldn’t be necessary when working electronically, but that may make sense when preparing material for ‘real-life’ musicians to perform.

Flattened Tempo and MIDI Detail

Here’s what it sounds like:

And below the above example in the context of my twitteropera:

Have you published a response to this? (Learn more):

Rudiger Meyer is a composer interested in the play between traditional concert music and new media.