↬ Twenty-two – 06: Voice Models, Battles

Hi, I’m Rudiger Meyer and this is my monthly newsletter covering what I’ve been up to and what’s been catching my attention.

Bastian Allgeier, the creator of the Kirby CMS that my website is built on, recently gave a talk at the Beyond Tellerrand conference. It wasn’t about web stuff though, but rather, as he described it, a step outside of his comfort zone to talk about our responses to tackling the climate crisis. One of the points he touched on (around 14′32″ in the video) was how we end up abandoning taking action when overwhelmed with the immensity of the task. Faced with an ever present stream of bad news and apparent lack of progress we become numb to the problem, (unconsciously) tuning it out. Perception of the dire aspects of the situation needs to be amplified to drum up political action, and that at the same time can overwhelm any sense of progress and lead to us feeling blocked and paralysed when it comes taking actions of our own.

I’ve been struck by a similar dilemma when it comes to covering the war in Ukraine. The push and pull between the stories of Ukrainian success and desperation. The need to tell how well they’re doing to show that they’re worth supporting, and at the same time how badly it’s going in order to garner further support. The West dithering

I’ve been following the newspapers less, getting more out of other sources – Dmitri Masinski’s translations, for example. I mentioned him in my previous newsletter, and he has since put up a website – wartranslated.com – in addition to his tweet threads. It covers much of the same material but in a form that’s a little easier to read than the tweet fragments, and with the added advantage of being able to add a feed to my RSS reader – more on that later. Besides his numerous translations, of which the Girkin and Arestovych posts continue from provide an interesting picture from both sides, Dmitiri has also posted a few ‘opinion’ pieces of his own – What might be the average Russian’s worldview?, for example.

I’ve also been listening to The Eastern Border, a ‘Soviet History Podcast’ now primarily focussed on the war in Ukraine, by Lativian Kristaps Andrejsons. (The signature music for the podcast, appropriate as it might be, is quite something to take – I secretly wish he’d used Holger Czukay’s Der Osten Ist Rot :-) Certain episodes cover some of the translations published by Dmitri, and a recent episode tackled the topic of famine and the impending food shortages that the invasion has provoked, with special guest Danijela Žuvela from The Red Line Podcast – which is it’s own (very well produced) rabbit hole. Will Russia Trample into Moldova? for example.

(It’s strange to listen to the ‘Comrade’ greetings on The Eastern Border, a greeting that I recognise from certain circles having grown up in South Africa. The world feels like a very different place now. The optimism of the mid 90s when I first moved to Europe now on its head. I do still feel optimistic, but the current mood is definitely another.)

* * *

The Battle for the Mural — and the Future of Belarus appeared in my NYT The Daily podcast feed as a ‘Sunday Read’.

Sarah A. Topol’s story of an act of graffiti at a playground in Minsk that “turned into a remarkable campaign of defiance against an increasingly totalitarian regime” is read by Julia Whelan in a recording by Audm.
The “read by” is the interesting part, as I quickly began to think that it was most probably an audio ‘model’ of Julia’s voice, something along the lines of the the text to speech models that Everest Pipkin explored in their Shell Song. (See my August 2021 newsletter.) Everest investigated the extensive industry around the creation of custom text to speech voices – see their ‘making of’ presentation for the Open Data Institute.

The thing with Audm is that their selling point places great emphasis on their recordings being “read by humans”.

Our award-winning narrators are entrusted with voicing the most significant works of literature (think Joan Didion, Doris Kearns Goodwin, Robert Caro, etc.).
Thanks to them, listening to stories in Audm is a truly pleasurable experience. We wed the best content with the best voices in the industry.
Rather than using text-to-speech, however, Audm pays actual humans to read articles…

So why does the recording sound so much like one of the models that Everest describes? It takes over 100 minutes to listen to the Minsk graffiti story, and as wonderfully as the story is written it’s tough going listening to the audio version. I’m having a difficult time believing that Julia Whelan read all that – unconvinced that it’s not just a model of her voice. (Reading the story ‘in print’ on the NYT site also has the advantage of it being interspersed with a series of photographs that add a lot to its unfolding.) Julia also narrates This Was Trump Pulling a Putin, another NYT ‘Sunday Read’. Again tough going aurally, though a little more manageable clocking in at 54 minutes. Julia has some narration examples on her own site that can be used as a point of comparison. Her style is distinctive, but I’m pretty sure the NYT stuff is a computer model. 100 minutes without any large scale shifts in pitch doesn’t sound human. Surely the base tone/ambitus should shift slightly over that amount of time? Might be worth undertaking a little analysis.

* * *

Anna Cornelia Ploug presented ‘Læseblokadens erotik’ (The eroticism of the readers-block), a little essay of hers on the Atlas magazine
podcast. Finding herself more challenged by reading assignments than producing her own writing production, she found some answers in Myrestier - Figurer for skrift og læsning i antikkens Grækenland. (Figures for Writing and Reading in Ancient Greece), a book by Jesper Svenbro.

The premise is that back then reading out loud was a primarily a form of distribution – to give sound and voice to a text that could not itself reach its audience, but first needed to be made physical. Because of this narration was regarded as a degrading activity. Something one might make use of slaves to carry out.

(No wonder we move towards assigning this task to computer generated models of our voices. Everest also touched on the literal slavery that goes into generating the voice models. On that tack see also Ben Werdmüller’s post on the modern day slavery that many of the tech giants make use of.)

She considers this the possible root of the unease that can arise with reading (out) another’s text – one has to put oneself aside, temporarily give up one’s own identity. Reading out loud one lends one’s voice to something one doesn’t necessarily agree with. And then there is also the difference between the one reading the text and those following it. Anna argues that one can reintroduce symmetry into the relationship when the desire flows both ways, not just the (erotic) desire of the author to ‘penetrate’ the listener, but a consensual space opened up by both parts.

If that seems all too academic, it struck me how well it applies to propaganda – where there is a violence in the fantasies and program of the ‘penetrating part’, well illustrated by the Russian bots that overwhelm any #ukraine hashtag these days. Or any statement from the Russian government channels for that matter.

* * *

The speech models are not immediately what springs to mind when thinking of A.I (better described as machine learning?), but that’s what they are. Jamie Lidell had a chat with Holly Herndon on his Hanging Out With Audiophiles podcast. She’s been doing a lot with music and technology, creating a Holly+ version of her voice that others can upload files to and use as an instrument. Chart magazine had a chat with her too a few years back, but that’s a slightly more awkward interview.

Jamie starts off the podcast with a little tour of his piano tuned in a mean-tone temperament, comparing equally-tempered triads with their mean-tone counterparts. It’s fun, although I wish that he’d moved through them by fifths rather than chromatically.

W. David Marx, writing from Tokyo, touched on Future Sounds: Cheap Gear and Easy Samples in his Culture, a manual newsletter. How A.I has been carving out new territory, transforming the idea of a ‘sample’ into something else when the entire drum (or whatever) track can be isolated from a song for example.

The startup Audioshake’s AI pulls out the “stems” out from any piece of music, so you can input a fully orchestrated, mixed-down track and within seconds, get only the drums, or only the vocals, or only the guitar — an advanced technology indistinguishable from magic. While Audioshake is theoretically for your own songs, this augurs a near future when we can all pull every drum track out of the entire James Brown catalog. The entire genre of hip-hop developed from the fact you couldn’t pull out specific sounds from a song, so DJs were limited to a small set of recordings that featured breakbeats.

He kicks off the newsletter with the question “Does the democratization of electronic music tools change how we hear the sounds?” mentioning the Roland Juno 6, his first synth, which happens to be one of the first synths (the Juno 60 to be precise) that I started out with as well – so close to my heart. He covers how previously difficult to obtain sounds are now readily available to all in the form of soft synths. The same might be said for many pieces of ‘unobtainium’ audio gear. The Polish Radio Studio filters recreated by Felt instruments for example. (Used extensively on the last years Flowmatic Blood Moon podcast with Shadi Bazeghi and Mansoor Hosseini.)

With synths getting cheaper and sampling about to get incredibly easy, democratization and exhaustion are serious specters haunting music production. But these aren’t just issues for electronic music: they’re the central cultural questions of our time.

Robin Sloan has also done a bit of experimentation with music A.Is and collected some thoughts of his own on A.I art in general. While he still seems to maintain some enthusiasm for music, writing doesn’t come off all that well:

In all my work with language models since 2016, nothing has approached this feeling, and I think I am ready to close the loop, finally: I don’t believe AI tools are useful for serious writers. I can’t provide an explanation for this difference; not even much of a theory, honestly. I’m just reporting my findings.

I gave talks about the fun and potential of AI-assisted writing; the New York Times wrote about my tools, yikes! Back then, I was still tinkering with hyperparameters more than I was, uh, actually writing. It was the actually writing that clarified the situation.

In retrospect, I understand that it was the language models themselves that captivated me, rather than the words they produced. Their bullshit acknowledged, these models remain, for me, provocative “objects”; complex scintillating gems; computational Infinity Stones. The good stories aren’t in them, but about them: their creation, possession, loss, recapture. Maybe a curse or two.

That’s the paradox of AI art: it leverages access to the spigot of infinity to produce a sense of scarce invention. In an overstuffed audiovisual landscape, it’s the “AI” and not the “art” that provides a reason to look at this and not that, listen to this and not that.

He has some thoughts on synths too:

AI art recalls the early days of synthesizers, perhaps; what was Switched-On Bach if not “I see what you did there”? I hope that analogy is right, because the synth provides a healthy, sustainable template for this genre.

But the Switched-On Bach aspect (“it’s the “AI” and not the “art” that provides a reason to look at this and not that”) is perhaps the least interesting, least sustained aspect of synths – a quirk of the early days long since surpassed.

Ubiquitous and unremarkable, controllable and hackable, with flavors ranging from fully corporate to gloriously DIY … I’m realizing, as I type this, that synthesizers might be one of the truly utopian technologies.

* * *

Returning to David Marx’s Culture, a manual with his post on Art versus commerce in the NFT Era:

So we’ve seen a clear intellectual shift in the last 170 years around art and commerce:

  1. True art can’t be commercial
  2. True art shouldn’t be commercial
  3. Art can still be good even if it’s commercial
  4. Commercial art is the best art
  5. Commerce is art

The NFT community’s claim to resolve the conflict between art and commerce is true: They flipped the entire logic on its head.

With music things are perhaps a little different. Holly Herndon’s Holly+ is linked to a bunch of blockchain and NFT stuff and she provides a page covering some of the issues around voice model ‘rights’, for example. I have to give it a view of it being ‘the future’ – more so than the visual art NFT stuff.

For a pretty damning review of that see Ben Davis’ I Looked Through All 5,000 Images in Beeple’s $69 Million Magnum Opus. What I Found Isn’t So Pretty on Artnet News.

To repeat Robin Sloan: “The good stories aren’t in them, but about them.”

Or Craig Mod in some now deleted tweets:

has anything gone from
hmm, interesting
neat, what can we do with this
Infinite Dumb Money
collective waking up
violently UNCRITICIZABLE by True Believers
utterly untouchable
as fast as nfts? (the above was ~12 mo or so)

Of course there’s something interesting in the base tech, and might prove useful going forward, but the fever dream of last summer feels a thousand years in the past; heck, it’s so weirdly toxic you don’t even see folks using the hexagon avatars to strut their warez

nfts are a good example of how money/monetization as a prime attribute corrupts a space; a good heuristic here is asking “if money wasn’t involved, would this have been interesting?” and mostly, the answer is a resounding no

“yes but at least the art was wonderful” said no one

Why am I even talking about this again?

* * *

There’s a new ‘poetisk podcast’:

EFTERTIDEN: Niels Frank has reworked his poetry collection Små guder (Small gods) for the podcast, and Rosanna Lorenzen has composed the music: “Hovering around the words, whispering and bubbling, at once distant and near, warm and cool.”

* * *

In my previous newsletters I’ve mentioned how impressed I was by the synthesised sounds of John McGuire’s Pulse Music, and was delighted to discover that Unseen Worlds provide some extensive Liner notes and an Interview on their site.

The sounds were apparently inspired by a Dan Flavin installation:

My reaction was to start looking for ways to make sounds “glow.” After much trial and error I decided on the following: given that the new synthesizer at the WDR—an EMS Synthi 100—had six voltage-controlled sine-tone oscillators the sounds are composites of six sine tones: a fundamental and five partials tuned in whole number relationships, for example 1-3-4-6-9-18. The fundamental (1) is very soft; the partials increase in loudness from lowest to highest (18). In addition, each of these sine tones has a different onset transient, increasingly hard, or sudden, toward the top. This particular relation of loudness to onset transients seemed to lend the sounds at least some of the imagined glow.

I should emphasize that there was very little theory behind any of this, just a lot of experimentation, every day over a period of weeks—the “Experimentierphase,” as the indispensable audio engineer, Volker Müller, called it. In all, the piece required seven months of daily work in the studio, much of which was taken up with intonation and onset transients (my emphasis), particularly at high speeds where onset transients can quickly get noisy.

The Unseen Worlds site has some other goodies as well. I’ve been enjoying James Rushford’s mesmerising See the Welter, “written partly as an exercise in helping me, as a composer-performer, understand the extraordinary and mystifying Música Callada”.

I have found that each work has uniquely influenced my interpretation of the other. Thus, my very slow and wandering performance of Mompou’s work seems a result of frequently playing See The Welter together with it.

Mompou’s Música Callada is a set of pieces I was first introduced to by Herbert Henk back in my Bremen student days – a marked contrast to the more complex works he was known to tackle: Stockhausen’s Klavierstücke or Clarence Barlow’s Çoǧluotobüsişletmesi, for example.

Mompou’s pieces are beautiful, almost Sakamotoesque in some cases, especially so with James Rushford’s wonderfully clear recording. It’s his See The Welter though that keeps drawing me back.

* * *

I’ve been thinking further about the connection between war and aesthetics, having touched on the topic in my previous newsletter. Roger Fenton was one of the first photographers to document a war – the Crimean War as it turns out – with one of his most famous pictures being the Valley of the Shadow of Death.

Here are all of his Crimean Photographs.

Something reminded me of a tripartite article on the placement of the cannonballs in the photograph, that I’d read many years ago. A (very!) lengthy investigation into the question of whether said cannon balls were manually placed for the sake of the picture or not.

(I recently read a tweet somewhere about how the West loves thorough investigations after the fact (this was in relation to the Bucha war crimes), but is less good at investing the same amount of energy in actions that could prevent those disasters in the first place.)

The Crimean War also saw the first tactical use of railways and other modern inventions, …It was the first European war to be photographed.

The Wikipedia entry on the war also offers the following commentary:

…when in 1870 Russia was able calmly to secure the revocation of the Treaty, which disarmed her in the Black Sea, the view became general of the war was stupid and unnecessary, and effected nothing… The Crimean war remained as a classic example… of how governments may plunge into war, how strong ambassadors may mislead weak prime ministers, how the public may be worked up into a facile fury, and how the achievements of the war may crumble to nothing.

That was written from the British perspective and as an argument for pacifism. History wishing to repeat itself, this time round for the Russians?

But there were positive changes that the war set in motion. Florence Nightingale, for example, rose to prominence as a result of her experiences there and the reforms that grew out of them.

The Crimean War was a contributing factor in the Russian abolition of serfdom in 1861: Tsar Alexander II (Nicholas I’s son and successor) saw the military defeat of the Russian serf-army by free troops from Britain and France as proof of the need for emancipation. The Crimean War also led to the realisation by the Russian government of its technological inferiority, in military practices as well as weapons.

Still thinking about war and aesthetics I happened to chance across an IRCAM livestream on the Xenakis Polytopes. In it, some comments on Xenakis’ integration of his war experiences into his art – a sublimation of those experiences ‘after the fact’, rather than a glorification and provocation of them, as with the Italian Futurists.

* * *

Closing off with another, not at all abstract, side of the war – an image that has stuck with me amongst endless media barrage of death and destruction, has been the moment (around 1′39″) a volunteer gently touches the fingers of an elderly woman being evacuated in this short BBC documentary .

All the best
↬ Rudiger

Leave a comment

Available formatting commands

Use Markdown commands or their HTML equivalents to add simple formatting to your comment:

Text markup
*italic*, **bold**, ~~strikethrough~~, `code` and <mark>marked text</mark>.
- Unordered item 1
- Unordered list item 2
1. Ordered list item 1
2. Ordered list item 2
> Quoted text
Code blocks
// A simple code block
// Some PHP code
[Link text](https://example.com)
Full URLs are automatically converted into links.

<a rel="me" class="p-name u-url" href="https://rudigermeyer.com">Rudiger Meyer</a> is a composer interested in the play between traditional concert music and new media.