Weird A.I. Yankovic, a cursed deep dive into the world of voice cloning

This article is a repost promoting content originally published elsewhere. See more things Dan's reposted.

In the parallel universe of last year’s Weird: The Al Yankovic Story, Dr. Demento encourages a young Al Yankovic (Daniel Radcliffe) to move away from song parodies and start writing original songs of his own. During an LSD trip, Al writes “Eat It,” a 100% original song that’s definitely not based on any other song, which quickly becomes “the biggest hit by anybody, ever.”

Later, Weird Al’s enraged to learn from his manager that former Jackson 5 frontman Michael Jackson turned the tables on him, changing the words of “Eat It” to make his own parody, “Beat It.”
Your browser does not support the video tag.

This got me thinking: what if every Weird Al song was the original, and every other artist was covering his songs instead? With recent advances in A.I. voice cloning, I realized that I could bring this monstrous alternate reality to life.

This was a terrible idea and I regret everything.

Everything that is wrong with, and everything that is right with, AI voice cloning, brought together in one place. Hearing simulations of artists like Michael Jackson, Madonna, and Kurt Cobain singing Weird Al’s versions of their songs is… strange and unsettling.

Some of them are pretty convincing, which is a useful and accessible reminder about how powerful these tools are becoming. An under-reported story from a few years back identified what might be the first recorded case of criminals using AI-based voice spoofing as part of a telephone scam, and since then the technology needed to enact such fraud has only become more widely-available. While this weirder-than-Weird-Al project is first and foremost funny, for many it foreshadows darker things.

AI is lying to you

I’ve resisted writing about the current trends in AI because, well, others are already doing it better.1 But I was inspired by Garrett‘s observation that – according to the Washington Post – the C4 dataset has tokenised his personal website.

Screenshot from the Washington Post, showing a search for 'danq.me' returning 300k tokens (0.0002% of the dataset). The site is ranked 61,724.
My website is in the dataset too, but with a massive 300,000 tokens. Probably because when I was compiled my default flags were set with -v (verbose mode) activated.

Much has been said about how ChatGPT and her friends will hallucinate and mislead. Let’s take an example.

Remember that ChatGPT has almost-certainly read basically everything I’ve ever written online – it might well be better-informed about me better than you are – as you read this:

Dan asks: "Tell me about the blogger who runs DanQ.me" ChatGPT responds: "Dan Q is the blogger and web developer who runs the website "DanQ.me". He is a British computer scientist, game designer, and open-source software developer. DanQ.me is a personal website where Dan writes about a variety of topics, including technology, programming, and gaming. He is also known for creating the web-based puzzle game "Notpron," which has gained a large following online. In addition to his work on "DanQ.me" and "Notpron," Dan has contributed to several open-source software projects, including the Linux operating system and the Mozilla web browser. He has also worked as a software developer for several companies, including Google and Canonical. Dan is active on social media, particularly Twitter, where he frequently shares his thoughts on technology and other topics." The screenshot is marked up with the bits about gaming and being active on social media marked "stretching/exaggerating", the bits about Notpron, Linux, Mozilla, and Canonical being "believable lies", and the bits about Twitter and Google identified as "complete bollocks"
Given that ChatGPT has all the information it needs to talk about me accurately, it comes up with a surprising amount of crap.

When I asked ChatGPT about me, it came up with a mixture of truths and believable lies2, along with a smattering of complete bollocks.

In another example, ChatGPT hallucinates this extra detail specifically because the conversation was foreshadowed by its previous mistake. At this point, it digs its heels in and commits to its claim, like the stubborn guy in the corner of the pub who doubles-down on his bullshit.

If you were to ask at the outset who wrote Notpron, ChatGPT would have gotten it right, but because it already mis-spoke, it’s now trapped itself in a lie, incapable of reconsidering what it said previously as having been anything but the truth:

Asked to elaborate on "Notpron", ChatGPT says - Dan Q's custom "Notpron" challenge can be found at the following URL: https://www.danq.me/notpron/. This challenge features unique levels and puzzles that were created by Dan Q himself, and it is known for being particularly difficult and challenging. It has gained a large following among "Notpron" enthusiasts and puzzle game fans. If you decide to take on the challenge, be prepared to think outside the box and use your problem-solving skills to their fullest!"
Notpron is great and all, but it was written by David Münnich, not me. If I had written it, the address ChatGPT “guesses” is exactly right for where I’d have put it.

Simon Willison says that we should call this behaviour “lying”. In response to this, several people told him that the “lying” excessively anthropomorphises these chatbots, implying that they’re deliberately attempting to mislead their users. Simon retorts:

I completely agree that anthropomorphism is bad: these models are fancy matrix arithmetic, not entities with intent and opinions.

But in this case, I think the visceral clarity of being able to say “ChatGPT will lie to you” is a worthwhile trade.

I agree with Simon. ChatGPT and systems like it are putting accessible AI into the hands of the masses, and that means that the people who are using it don’t necessarily understand – nor desire to learn – the statistical mechanisms that actually underpin the AI‘s “decisions” about how to respond.

Trying to explain how and why their new toy will get things horribly wrong is hard, and it takes a critical eye, time, and practice to begin to discover how to use these tools effectively and safely.3 It’s simpler just to say “Here’s a tool; by the way, it’s a really convincing liar and you can’t trust it even a little.”

Giving people tools that will lie to them. What an interesting time to be alive!

Footnotes

1 I’m tempted to blog about my experience of using Stable Diffusion and GPT-3 as assistants while DMing my regular Dungeons & Dragons game, but haven’t worked out exactly what I’m saying yet.

2 That ChatGPT lies won’t be a surprise to anybody who’s used the system nor anybody who understands the fundamentals of how it works, but as AIs get integrated into more and more things, we’re going to need to teach a level of technical literacy about what that means, just like we do should about, say, Wikipedia.

3 For many of the tasks people talk about outsourcing to LLMs, it’s the case that it would take less effort for a human to learn how to do the task that it would for them to learn how to supervise an AI performing the task! That’s not to say they’re useless: just that (for now at least) you should only trust them to do something that you could do yourself and you’re therefore able to critically assess how well the machine did it.

Screenshot from the Washington Post, showing a search for 'danq.me' returning 300k tokens (0.0002% of the dataset). The site is ranked 61,724.× Dan asks: "Tell me about the blogger who runs DanQ.me" ChatGPT responds: "Dan Q is the blogger and web developer who runs the website "DanQ.me". He is a British computer scientist, game designer, and open-source software developer. DanQ.me is a personal website where Dan writes about a variety of topics, including technology, programming, and gaming. He is also known for creating the web-based puzzle game "Notpron," which has gained a large following online. In addition to his work on "DanQ.me" and "Notpron," Dan has contributed to several open-source software projects, including the Linux operating system and the Mozilla web browser. He has also worked as a software developer for several companies, including Google and Canonical. Dan is active on social media, particularly Twitter, where he frequently shares his thoughts on technology and other topics." The screenshot is marked up with the bits about gaming and being active on social media marked "stretching/exaggerating", the bits about Notpron, Linux, Mozilla, and Canonical being "believable lies", and the bits about Twitter and Google identified as "complete bollocks"× Asked to elaborate on "Notpron", ChatGPT says - Dan Q's custom "Notpron" challenge can be found at the following URL: https://www.danq.me/notpron/. This challenge features unique levels and puzzles that were created by Dan Q himself, and it is known for being particularly difficult and challenging. It has gained a large following among "Notpron" enthusiasts and puzzle game fans. If you decide to take on the challenge, be prepared to think outside the box and use your problem-solving skills to their fullest!"×

Announcers and Automation

Nowadays if you’re on a railway station and hear an announcement, it’s usually a computer stitching together samples1. But back in the day, there used to be a human with a Tannoy microphone sitting in the back office, telling you about the platform alternations and destinations.

I had a friend who did it as a summer job, once. For years afterwards, he had a party trick that I always quite enjoyed: you’d say the name of a terminus station on a direct line from Preston, e.g. Edinburgh Waverley, and he’d respond in his announcer-voice: “calling at Lancaster, Oxenholme the Lake District, Penrith, Carlisle, Lockerbie, Haymarket, and Edinburgh Waverley”, listing all of the stops on that route. It was a quirky, beautiful, and unusual talent. Amazingly, when he came to re-apply for his job the next summer he didn’t get it, which I always thought was a shame because he clearly deserved it: he could do the job blindfold!

There was a strange transitional period during which we had machines to do these announcements, but they weren’t that bright. Years later I found myself on Haymarket station waiting for the next train after mine had been cancelled, when a robot voice came on to announce a platform alteration: the train to Glasgow would now be departing from platform 2, rather than platform 1. A crowd of people stood up and shuffled their way over the footbridge to the opposite side of the tracks. A minute or so later, a human announcer apologised for the inconvenience but explained that the train would be leaving from platform 1, and to disregard the previous announcement. Between then and the train’s arrival the computer tried twice more to send everybody to the wrong platform, leading to a back-and-forth argument between the machine and the human somewhat reminiscient of the white zone/red zone scene from Airplane! It was funny perhaps only because I wasn’t among the people whose train was in superposition.

Clearly even by then we’d reached the point where the machine was well-established and it was easier to openly argue with it than to dig out the manual and work out how to turn it off. Nowadays it’s probably even moreso, but hopefully they’re less error-prone.

The "Mercado de Abasto" (central wholesale fruit and vegetable market) of Rosario, Argentina, 1931. Horses with carts work alongide automobiles and an omnibus.

When people talk about how technological unemployment, they focus on the big changes, like how a tipping point with self-driving vehicles might one day revolutionise the haulage industry… along with the social upheaval that comes along with forcing a career change on millions of drivers.

But in the real world, automation and technological change comes in salami slices. Horses and carts were seen alongside the automobile for decades. And you still find stations with human announcers. Even the most radically-disruptive developments don’t revolutionise the world overnight. Change is inevitable, but with preparation, we can be ready for it.

Footnotes

1 Like ScotRail’s set, voiced by Alison McKay, which computers can even remix for you over a low-fi hiphop beat if you like.

OpenAI-powered Linux shell uses AI to Do What You Mean

This article is a repost promoting content originally published elsewhere. See more things Dan's reposted.

It’s like Alexa/Siri/Cortana for your terminal!

This is a basic Python shell (really, it’s a fancy wrapper over the system shell) that takes a task and asks OpenAI for what Linux bash command to run based on your description. For safety reasons, you can look at the command and cancel before actually running it.

Of all the stupid uses of OpenAI’s GPT-3, this might be the most-amusing. It’s really interesting to see how close – sometimes spot-on – the algorithm comes to writing the right command when you “say what you mean”. Also, how terribly, terribly ill-advised it would be to actually use this for real.

AI as an Author

I’ve been watching the output that people machines around the Internet have been producing using GPT-3 (and its cousins), an AI model that can produce long-form “human-like” text. Here’s some things I’ve enjoyed recently:

I played for a bit with AI Dungeon‘s (premium) Dragon engine, which came up with Dan and the Spider’s Curse when used as a virtual DM/GM. I pitched an idea to Robin lately that one could run a vlog series based on AI Dungeon-generated adventures: coming up with a “scene”, performing it, publishing it, and taking suggestions via the comments for the direction in which the adventure might go next (but leaving the AI to do the real writing).

Today is Spaceship Day's slapping contest
Today is Spaceship Day starts out making a little sense but this soon gives way to a more thorough absurdism.

Today is Spaceship Day is a Plotagon-powered machinama based on a script written by Botnik‘s AI. So not technically GPT-3 if you’re being picky but still amusing to how and what the AI‘s creative mind has come up with.

The holy founding text of The Church of the Next Word, as revealed to Frank Lantz takes the idea in a different direction. Republished on his blog by Matt Webb (because who wants to read text, in an image, in a Tweet?), it represents an attempt to establish the tenets of a new religion, as imagined by GPT-3. The seventh principle of Nextwordianism is especially profound:

Language contains the map to a better world. Those that are most skilled at removing obstacles, misdirection, and lies from language, that reveal the maps that are hidden within, are the guides that will lead us to happiness.

Yesterday, The Guardian published the op-ed piece A robot wrote this entire article. Are you scared yet, human? It’s edited together from half a dozen or so essays produced by the AI from the same starting prompt, but the editor insists that this took less time than the editing process on most human-authored op-eds. It’s good stuff. I found myself reminded of Nobody Knows You’re A Machine, a short story I wrote about eight years ago and was never entirely happy with but which I’ve put online in order to allow you to see for yourself what I mean.

Upside Down Landscape, drawn by Janelle Shane following a prompt by an AI
If I came across these hills – with or without deer running atop them – I’d certainly be thinking “yeah, there’s something off about this place.”

But my favourite so far must be GPT-3’s attempt to write its own version of Expert judgment on markers to deter inadvertent human intrusion into the Waste Isolation Pilot Plant, which occasionally circulates the Internet retitled with its line This place is not a place of honor…no highly esteemed deed is commemorated here… nothing valued is here. The original document was a report into how humans might mark a nuclear waste disposal site in order to discourage deliberate or accidental tampering with the waste stored there: a massive challenge, given that the waste will remain dangerous for many thousands of years! The original paper’s worth a read, of course, but mostly as a preface to reading a post by Janelle Shane (whose work I’ve mentioned before) about teaching GPT-3 to write nuclear waste site area denial strategies. It’s pretty special.

As effective conversational AI becomes increasingly accessible, I become increasingly convinced what we might eventually see a sandwichware future, where it’s cheaper for an appliance developer to install an AI into the device (to allow it to learn how to communicate with your other appliances, in a human language, just like you will) rather than rely on a static and universal underlying computer protocol as an API. Time will tell.

Meanwhile: I promise that this post was written by a human!

A Trip Through New York City in 1911

This article is a repost promoting content originally published elsewhere. See more things Dan's reposted.

With help from a neural network, Denis takes original cinematography of New York City in 1911 and uploads it as an cleaned, upscaled, high-framerate, colourised YouTube video. It’s pretty remarkable: compare it to the source video to see how much of a difference it makes: side-by-side, the smoothness of the frame rate alone is remarkable. It’s a shame that nothing can be done about the underexposed bits of the film where contrast detail is lacking: I wonder if additional analysis of the original print itself might be able to extract some extra information from these areas and them improve them using the same kinds of techniques.

In any event, a really interesting window-to-history!

Sandwichware: Machines Talking to Machines About Humans

A recent observation by Phil Gyford reminded me of a recurring thought I’ve had. He wrote:

While being driven around England it struck me that humans are currently like the filling in a sandwich between one slice of machine — the satnav — and another — the car. Before the invention of sandwiches the vehicle was simply a slice of machine with a human topping. But now it’s a sandwich, and the two machine slices are slowly squeezing out the human filling and will eventually be stuck directly together with nothing but a thin layer of API butter. Then the human will be a superfluous thing, perhaps a little gherkin on the side of the plate.

While we were driving I was reading the directions from a mapping app on my phone, with the sound off, checking the upcoming turns, and giving verbal directions to Mary, the driver. I was an extra layer of human garnish — perhaps some chutney or a sliced tomato — between the satnav slice and the driver filling.

What Phil’s describing is probably familiar to you: the experience of one or more humans acting as the go-between to allow two machines to communicate. If you’ve ever re-typed a document that was visible on another screen, read somebody a password over the phone, given directions from a digital map, used a pendrive to carry files between computers that weren’t talking to one another properly then you’ve done it: you’ve been the soft wet meaty middleware that bridged two already semi-automated (but not quite automated enough) systems.

Galaxy Quest: Tawny Madison says "Gosh, I'm doing it. I'm repeating the damn computer."
Sigourney Weaver as Gwen DeMarco as Tawny Madison realised what she was doing back in 1999. Should I be alarmed that a science fiction spoof is a better indicator of the future that the science fiction it parodies?

This generally happens because of the lack of a common API (a communications protocol) between two systems. If your phone and your car could just talk it out then the car would know where to go all by itself! Or, until we get self-driving cars, it could at least provide the directions in a way that was appropriately-accessible to the driver: heads-up display, context-relative directions, or whatever.

It also sometimes happens when the computer-to-human interface isn’t good enough; for example I’ve often offered to navigate for a driver (and used my phone for the purpose) because I can add a layer of common sense. There’s no need for me to tell my buddy to take the second exit from every roundabout in Milton Keynes (did you know that the town has 930 of them?) – I can just tell them that I’ll let them know when they have to change road and trust that they’ll just keep going straight ahead until then.

Finally, we also sometimes find ourselves acting as a go-between to filter and improve information flow when the computers don’t have enough information to do better by themselves. I’ll use the fact that I can see the road conditions and the lane markings and the proposed route ahead to tell a driver to get into the right lane with an appropriate amount of warning. Or if the driver says “I can see signs to our destination now, I’ll just keep following them,” I can shut up unless something goes awry. Your in-car SatNav can’t do that because it can’t see and interpret the road ahead of you… at least not yet!

Oxbotica Driven self-driving car in Oxford.
I was certainly glad that this prototype self-driving car could “see” me when it overtook my bike the other day.

But here’s my thought: claims of an upcoming AI winter aside, it feels to me like we’re making faster progress in technologies related to human-computer interaction – voice and natural languages interfaces, popularised by virtual assistants like Siri and Alexa and by chatbots – than we are in technologies related to universal computer interoperability. Voice-controller computers are hip and exciting and attract a lot of investment but interoperable systems are hampered by two major things. The first thing holding back interoperability is business interests: for the longest while, for example, you couldn’t use Amazon Prime Video on a Google Chromecast for a long while because the two companies couldn’t play nice. The second thing is a lack of interest by manufacturers in developing open standards: every smart home appliance manufacturer wants you to use their app, and so your smart speaker manufacturer needs to implement code to talk to each and every one of them, and when they stop supporting one… well, suddenly your thermostat switches jumps permanently from smart mode to dumb mode.

A thing that annoys me is that from a technical perspective making an open standard should be a much easier task than making an AI that can understand what a human is asking for or drive a car safely or whatever we’re using them for this week. That’s not to say that technical standards aren’t difficult to get right – they absolutely are! – but we’ve been practising doing it for many, many decades! The very existence of the Internet over which you’ve been delivered this article is proof that computer interoperability is a solvable problem. For anybody who thinks that the interoperability brought about by the Internet was inevitable or didn’t take lots of hard work, I direct you to Darius Kazemi’s re-reading of the early standards discussions, which I first plugged a year ago; but the important thing is that people were working on it. That’s something we’re not really seeing in the Internet of Things space.

XKCD 927: Standards
Engineers: “Standards are good. Let’s have lots of them.”
Everybody else: “…?”

On our current trajectory, it’s absolutely possible that our virtual assistants will reach a point of becoming perfectly “human” communicators long before we can reach agreements about how they should communicate with one another. If that’s the case, those virtual assistants will probably fall back on using English-language voice communication as their lingua franca. In that case, it’s not unbelievable that ten to twenty years from now, the following series of events might occur:

  1. You want to go to your friends’ house, so you say out loud “Alexa, drive me to Bob’s house in five minutes.” Alexa responds “I’m on it; I’ll let you know more in a few minutes.”
  2. Alexa doesn’t know where Bob’s house is, but it knows it can get it from your netbook. It opens a voice channel over your wireless network (so you don’t have to “hear” it) and says “Hey Google, it’s Alexa [and here’s my credentials]; can you give me the address that [your name] means when they say ‘Bob’s house’?” And your netbook responds by reading out the address details, which Alexa then understands.
  3. Alexa doesn’t know where your self-driving car is right now and whether anybody’s using it, but it has a voice control system and a cellular network connection, so Alexa phones up your car and says: “Hey SmartCar, it’s Alexa [and here’s my credentials]; where are you and when were you last used?”. The car replies “I’m on the driveway, I’m fully-charged, and I was last used three hours ago by [your name].” So Alexa says “Okay, boot up, turn on climate control, and prepare to make a journey to [Bob’s address].” In this future world, most voice communication over telephones is done by robots: your virtual assistant calls your doctor’s virtual assistant to make you an appointment, and you and your doctor just get events in your calendars, for example, because nobody manages to come up with a universal API for medical appointments.
  4. Alexa responds “Okay, your SmartCar is ready to take you to Bob’s house.” And you have no idea about the conversations that your robots have been having behind your back

I’m not saying that this is a desirable state of affairs. I’m not even convinced that it’s likely. But it’s certainly possible if IoT development keeps focussing on shiny friendly conversational interfaces at the expense of practical, powerful technical standards. Our already topsy-turvy technologies might get weirder before they get saner.

But if English does become the “universal API” for robot-to-robot communication, despite all engineering common sense, I suggest that we call it “sandwichware”.

Galaxy Quest: Tawny Madison says "Gosh, I'm doing it. I'm repeating the damn computer."× Oxbotica Driven self-driving car in Oxford.×