Friday, May 25, 2018


Do you HEAR the same thing I HEAR? Does your Alexa or Google Home HEAR the same thing? Probably not, which is why you're hearing a lot of four letter words.

Last week, everyone was going nuts about how the same audio clip could be heard by some people as saying "Laurel" and by others as "Yanni".

This week, people are going nuts about how an Amazon Alexa smart speaker misheard its trigger word and then had a series of mishearing instructions, treating them as commands to call someone.

In the first case, people went less nuts when someone broke the audio down and filtered out parts to emphasize how different parts of the waveforms could be heard by some people and not others. This shapes what our ears pick up and our brain processes. Similarly, some pointed out that if we are expecting to hear something, our brain is more ready to process that.

Oddly enough... Alexa's AI works similarly. Not identically, but there are parallels.

The device itself uses a range of technologies to pick up and focus on a particular voice and eliminate background noise. In simple terms, it uses a bunch of microphones (7 in the Echo's case, 2 on a Google Home) to measure the delay in when audio signals arrive, applies Machine Learning techniques to determine what is a real command and what is just noise, and then make sure it pays attention to the real command. (Google has better AI training, so it can get away with fewer microphones.) That process is good - but still not perfect, so there are still many cases where additional noises cause interference or it can interpret one word as another or a person's voice can end up tricking the training slightly in some cases.

Since humans don't talk identically every time, Alexa and other voice recognition systems have to allow for a little bit of variance. But by allowing that, the ML systems end up relying on probabilities. The system never knows for sure that the person is who it thinks it is - just that it is "highly probable" that it is. It helps that it might only be expecting a few people, so a more narrow range of variance, but it still isn't perfect. And, once woken, it is expecting a person to say things that are still somewhat narrow, so it isn't out of the question that it "mishears" what two people are saying.

(I'm sure that audio experts, AI/ML experts, and pretty much everyone else will call say I'm simplifying the problem. I am.)

If you're ever curious what it is that your Alexa or Google Home hears, both let you play it back. Google offers this through it's Activity Console (which shows everything Google thinks you've done - you can filter just on the Assistant). Alexa offers this on the mobile app in Settings -> History. In both cases, you can see the command itself and play the audio. Go ahead and take a listen - it may be more difficult than you think.

No comments:

Post a Comment