The pronunciation of words by computers has gotten a lot better — at least in the movies. One of the latest futurist films — Ex Machina — has actress Alicia Vikander as the voice of the humanoid robot Ava.
Meanwhile, in the real world, computer voices such as those used for Siri, Cortana and Google Now don't seem to be able to say some words correctly.
At first, I wondered if it had something to do with the fact that a lot of research on computer voicing is done in England, where they pronounce common words like "schedule" very differently.
But Andrew Breen, the research director for text to speech at Nuance Communications, says the location is not relevant. Nuance helped create Siri for Apple.
Breen says the computer uses dictionaries to determine the correct way to say a word in a particular dialect or accent.
"We have a dictionary, and that's the first port of call," he says. But text-to-speech systems don't draw from full words, and mispronunciations can ensue. We reached out to our listeners for some of their favorite computer-voice flubs:
Susan Bennett, the original voice of Siri, spent four hours a day for five weeks laying down voice tracks. "Ninety percent of the phrases I recorded were nonsensical, created solely to get the sound combinations in the language," she says. "So I had to read things like: 'Say the shroding again. Say the shreeding again. Say the shrading again.' "
Breen says the computer draws from these sounds to assemble words based on pronunciations in a dictionary of American speech.
"You've got to go to that unique individual, that voice talent that's given you their basic sound system," he says. "And you've got to try and map this representation into something that you can speak back."
And this all happens in a fraction of a second.
Unfortunately, the computer can't always distinguish between words that have the same spelling but are pronounced differently depending on the meaning. For example, Mobile is a place and an auto is mobile.
"There is never really a hundred percent guarantee that when you've got a word that's in common language use and is also a place name that you're not going to choose the wrong one, unless you've got full context," Breen says.
Bennett says that when she was recording for Nuance they did record some place names. But there was a lot of guesswork.
"We'd say well, this street is in New Mexico, it has a Spanish name so I bet they pronounce it correctly the Spanish way," she says. "One day I said, 'Well, why are we guessing? Don't you guys have interns or someone that can look this information up and get it right?' "
She says that never happened.
Breen says pronunciations will improve as devices have better connections to the Internet where they can retrieve information more quickly.
But, he says, "Even with that context you have to take into account the situation where for whatever reason — maybe the device is in a tunnel or maybe it's in a room where it just can't get online — it can't just say no."
And, in case you were wondering, Bennett does have an iPhone — but she never uses Siri. "It's difficult to hear my voice saying certain things that I would never in a million years say."
Besides, Bennett says, she talks to herself enough already.
Transcript
ROBERT SIEGEL, HOST:
And it is time for All Tech Considered.
(SOUNDBITE OF MUSIC)
SIEGEL: The voices of computers have gotten a lot better, at least in the movies.
(SOUNDBITE OF FILM, "EX MACHINA")
ALICIA VIKANDER: (As Ava) Hello.
DOMHNALL GLEESON: (As Caleb) Hi.
VIKANDER: (As Ava) I've never met anyone new before. Have you?
SIEGEL: That's the humanoid robot Ava in the film "Ex Machina" played by Alicia Vikander. Meanwhile, in the real world, computer voices can't even say things right, like the name of the Jewish new year.
SIRI: Rosh Hashanah.
SIEGEL: Oy. We're talking computer speech today on All Tech. First, here's NPR's Laura Sydell to tell us why it's so hard to get computers to pronounce words correctly.
LAURA SYDELL, BYLINE: At first, I wondered if the problem had something to do with where a lot of the work on computer voicing is done.
ANDREW BREEN: My name is Andrew Breen.
SYDELL: And he's British, and they talk funny there.
BREEN: Yeah, schedule, Schedule?
SYDELL: Right.
BREEN: Yeah, schedule - thank you.
SYDELL: I prefer schedule here in America. Breen is based in England where he is the director of research for text to speech at Nuance Communications which helped create Siri for Apple. But that's not the problem. He says the computer uses dictionaries to determine the correct way to say a word in a particular dialect or accent.
BREEN: So we have a dictionary, a dictionary of pronunciations, and that's the first port of call.
SYDELL: Which doesn't always work. I reached out to our listeners for some of their favorite mispronunciation.
ELIZABETH CONNER: Mobile, Ala.
SIRI: Mobile. Alabama.
ANDREW DART: Rue Bourdeaux.
SIRI: Rue Bourdeaux.
BRET MILLER: Des Moines.
SIRI: Des Moines.
SYDELL: Thanks Elizabeth Conner, Andrew Dart and Bret Miller. Breen explains it this way. Whatever dictionary your voice system uses may not have those place names, so it sounds them out with common English usage. Here's someone who knows how to say it correctly in American English.
SUSAN BENNETT: Rue Bordeaux. Des Moines. Mobile, Ala.
SYDELL: This is Susan Bennett, and...
BENNETT: This is the original voice of Siri.
SYDELL: And she's talking to me. About a decade ago, Bennett was in the studio laying down tracks like this.
BENNETT: Ninety percent of the phrases I recorded were nonsensical, created solely to get all of the sound combinations in the language. So I had to read things like - say the shrodding (ph) again; say the shreeding (ph) again; say the shrading (ph) again; say the shroading (ph) again; say the shriding (ph) again and on and on and on.
SYDELL: Odd sound combinations are used to assemble words based on pronunciations in a dictionary of American speech.
BREEN: And then you've got to go to that unique individual, that voice talent that's given you their basic sound system, and you've got to try and map this representation into something that you can speak back.
SYDELL: This all happens in a fraction of a second. And getting back to those messed up place names - as anyone who has used Siri, Cortana, Google Now or any other computer voice system knows, this doesn't always work. Let's return to a place name.
CONNER: Mobile, Ala.
SYDELL: Which several voice systems say as...
SIRI: Mobile. Alabama.
SYDELL: Unfortunately, the computer can't distinguish between Mobile and mobile.
BREEN: There is never really 100 percent guarantee that when you've got a word that's in common language use and is also a place name that you're not going to choose the wrong one unless you've got full context.
SYDELL: This may be true for a word and place name like mobile, Mobile. Still, voice of Siri artist, Bennett, says when she was recording for Nuance, they did record some place names.
BENNETT: And we'd say, well, this street is in New Mexico. It has a Spanish name. So I bet they pronounce it correctly the Spanish way. One day, I said to them - I said, well, why are we guessing? Don't you guys have interns or someone that can look this information up and get it right?
SYDELL: That never happened. Breen says pronunciations will improve as devices have better connections to the Internet where they can retrieve information more quickly.
BREEN: Even with that context, you have to take into account the situation where, for whatever reason, maybe the device is in a tunnel or maybe it's in a room where it just can't get online. You know, it can't just say no.
SYDELL: Instead, if I'm in a tunnel and I can't get to the Internet and I ask Siri about that guy with square pants, you get something like this...
SIRI: Spongebob.
BENNETT: It's difficult to hear my voice saying certain things that I would never in a million years say.
SYDELL: Susan Bennett has an iPhone but says she never uses Siri.
SIRI: Laura Sydell, NPR News. San Francisco. Transcript provided by NPR, Copyright NPR.
300x250 Ad
300x250 Ad