A couple of incidents arose last week that seemed to strike at the heart of nascent technologies using voice for securely identifying a person.
The first was Queensland Police warning that the “Can you hear me?” scam had arrived in Australia. Originating from the United States, the scheme involves a malicious caller pretending to be a staff member from a utilities provider and asking “can you hear me?” – in order to induce an answer of “yes”. If the victim complies, then the “yes” is recorded and used later to imitate the person on phone banking systems.
The second bit of news was that a Canadian startup had developed technology that can imitate anyone’s speech using just a small voice sample. Montreal-based Lyrebird, named after the Australian bird that mimics the songs of others, as well as everyday human sounds, claims that just a one minute recording of a person’s speech is required for its technology to identify the voice “DNA”, which can then be used to synthesise any words.
While its technology has not gone live yet, Lyrebird has published samples of the synthesised voices of Donald Trump, Barack Obama and Hilary Clinton to make its point.
Such incidents could make people nervous about the rising use of voice as an automated identifier. All sorts of government agencies and corporate organisations – such as the Australian Taxation Office, ANZ Bank and Barclays — are starting to use such technology to reduce call centre labour costs and make identification less of a hassle for customers.
But Nuance Communications, the multinational that’s provided virtual assistants to a number of Australian organisations — including ANZ, ATO and iiNet — insists there is nothing to worry about.
“Most people would be able to, just with the human ear, distinguish that it’s somewhat unnatural when those computers generate those voices,” said the company’s director of voice biometrics Brett Beranek.
“Our algorithms can pick up those unnatural voice characteristics as well. We have the ability to perform [the detection] at a much higher level of precision than the human ear.”
Another clue that voice biometrics picks up on is how differently the same words are spoken on repeated occurrences. Beranek said that synthesised speech will likely repeat the same sounds.
“If I ask you to repeat the phrase ‘my voice my password’ 100 times and ask you to repeat it in the most consistent way possible, each one of those times would still be unique,” he told Business Insider. “And one of the challenges with a recording is that it doesn’t have that natural variability.”
When such unnatural tendencies are picked up, Nuance’s voice technology has “mechanisms” to challenge the speaker to verify their authenticity.
“For example, we’ll ask the caller to say something random – something that they weren’t expecting to speak,” he said.
“If there is a legitimate individual on the phone then they’ll have no issues providing that additional audio but if it’s a recording then they’ll be hard pressed to say the sentence asked of them.”
Trump says he’s not a robot
In spite of the numerous ways the technology can defeat artificial voices, Nuance is in touch with voice synthesis companies to stay one step ahead of the game.
“We reach out to any technology vendor… with technology in this domain that could be misused,” Beranek said, while declining to confirm specific vendors, such as Lyrebird.
“And we try to see if they’re willing to ensure that their technology has certain characteristics in the algorithms that facilitate detection [by Nuance]… so we don’t have to go to the challenge questions.”
The executive added that so far such approaches for collaboration had been met each time with a “positive reception”.
Lyrebird’s self-proclaimed world-first technology is not yet available for use, but one of its current audio samples does show a fake Trump saying the phrase “I am not a robot, my intonation is always different” in many different ways.
The startup, established by three current doctoral students at the University of Montreal, is spruiking its technologies for altruistic uses such as virtual assistants, movie production, speech synthesis for the disabled and audio books with celebrity voices. But it does admit the product could be used for nefarious purposes.
An ethics section on the venture’s website cited how voice recordings are used in many jurisdictions as strong legal testimony, but its imitation technology could question “the validity of such evidence, as it allows to easily manipulate audio recordings”.
“This could potentially have dangerous consequences such as misleading diplomats, fraud and more generally any other problem caused by stealing the identity of someone else,” the page reads.
The company stated that the release of its software in the public domain would raise awareness of the possibility of voice fraud and the decreasing value of audio recordings as legal evidence.
The “can you hear me” scam, according to urban myth fact check site Snopes, has not seen any actual cases of fraud committed with a recording of “yes”. Police in the US state of Virginia first issued a warning of the hypothetical scam in January, along with Better Business Bureau.