May 01, 2017      

Lyrebird, a Montreal-based AI startup, has taken a page straight out of Mission Impossible and created a voice imitation algorithm that will allow developers to “copy the voice of anyone” after listening to just 60 seconds of sample audio.

Adobe, Google and others have created life-like synthesized voices, but they haven’t been able to mimic voices as quickly as Lyrebird. Adobe’s Project VoCo, for example, requires a minimum 20 minutes of sample audio before it can mimic a voice. Lyrebird claims it can generate 1,000 sentences in less than half a second using GPU clusters.

To demo what its deep learning system can do, Lyrebird created the following synthesized tracks of President Donald Trump, Barack Obama and Hillary Clinton. Listen carefully to the Trump clips as you can hear the different intonations Lyrebird created.

Here’s a fake President Donald Trump sample:

And another fake Trump sample:

Here’s a fake Barack Obama sample:

Here’s a discussion between fake Trump, fake Obama and fake Hillary Clinton

The synthesized voices certainly aren’t perfect, they sound a little too robotics, but you can imagine the improvements the technology will make in the coming years. Lyrebird will eventually offer an API to developers to use the voice imitation technology for personal assistants, for reading of audio books with famous voices, for connected devices, for speech synthesis for people with disabilities, for animation movies or for video game studios. Lyrebird will offer a catalog of different voices and let the user design their own voices tailored for their needs.

While Lyrebird wants to use the technology for good reasons, it’s easy how the technology could fall into the wrong hands and be used to create false recordings of people that could have dangerous consequences. Lyrebird addresses the ethics of its voice imitation technology on its website, but we’re not sure the answer is adequate.

“Lyrebird is the first company to offer a technology to reproduce the voice of someone as accurately and with as little recorded audio. Such a technology raises important societal issues that we address in the next paragraphs.

“Voice recordings are currently considered as strong pieces of evidence in our societies and in particular in jurisdictions of many countries. Our technology questions the validity of such evidence as it allows to easily manipulate audio recordings. This could potentially have dangerous consequences such as misleading diplomats, fraud and more generally any other problem caused by stealing the identity of someone else.

“By releasing our technology publicly and making it available to anyone, we want to ensure that there will be no such risks. We hope that everyone will soon be aware that such technology exists and that copying the voice of someone else is possible. More generally, we want to raise attention about the lack of evidence that audio recordings may represent in the near future.”

Lyrebird’s technology is still in the developmental stages, but the company says 6,000 people have already signed up for early access to its APIs. Lyrebird will also be adding support for different languages.

Lyrebird relies on deep learning models developed at the MILA lab of the University of Montreal, where its three founders are currently PhD students: Alexandre de Brebisson, Jose Sotelo and Kundan Kumar.