“Alexa, play some music” isn’t the only time Amazon is listening to you.

by | Jan 7, 2019

8 minute read

Amazon’s voice recognition software only listens when you say the word “Alexa,” right?

That’s what most Echo and Dot buyers think because that’s what the advertising leads you to believe. As if by magic, your Alexa-enabled device “wakes up” when you say its name. But think about that for a moment. After you say the magic word, your Alexa-enabled device must listen to your request, interpret it, and respond. Just how much does Amazon really listen to inside your home? How much do you really know about how voice technology works when you unboxed your Alexa-enabled device?

(Fair warning: this is about to get awkward.)

You may have assumed your Echo or Dot listened and responded using the small computer housed inside the device itself. But that doesn’t make sense. The onboard computer simply isn’t powerful enough. And besides, Amazon continues to update the device. It must do this from a centralized server location. That’s the only place where there is enough computing power not only to interpret your request but also to update Alexa with new “skills” from third-party vendors. That’s how your device now knows how to order a pizza. Amazon needed to partner with Domino’s Pizza (in the United States) to develop that interface.

Now that you know that your voice recordings are being sent via the internet to a centralized location, you may have assumed Amazon will need to store that data for some period of time – for example, to use its Natural Language Processing algorithms to interpret your request for a weather report (or to buy a pizza), gather that information, and then send it back to your device for it to speak the response. The transaction happens so quickly that you assume Amazon would have no reason to keep the recording of your voice any longer than a few seconds. Besides, is that even feasible? Think of how much storage space Amazon would require for all of the audio files. Is there really a database somewhere storing all your “requests for weather reports?”

Those are good questions.

Imagine for a moment that you were curious about what, precisely, your Amazon Echo or Dot device recorded in your home. Now that you know it’s listening, you’d like to know what it heard. To satisfy that curiosity and put your mind at ease, you ask Amazon to send you a copy of the data your device has collected since you bought it.

After a few weeks, you receive your audio files from Amazon. Imagine your horror as you open the attachments and begin listening to the recordings: A discussion of what to have for dinner, two children arguing over a toy, a woman talking to her partner as she gets into the shower. You weren’t sure if Amazon would keep recordings at all. And if they did keep recordings, you thought your Echo or Dot recorded only your explicit requests.

But it gets worse. You don’t recognize any of the voices. With equal parts relief and horror, you realize you are listening to someone else’s Echo recordings!

 

As it turns out, all of your assumptions about voice technology were wrong.

This story isn’t a thought experiment. It is precisely what happened when a German citizen requested his data files from Amazon under the European Union’s GDPR. He expected to get a list of the products he has purchased, how he paid, and other commercial profile data Amazon compiled. Unlike my scenario, he wasn’t expecting audio recordings. He didn’t own an Alexa-enabled device. He shouldn’t have been getting any recordings, yet there they were.

According to the story originally reported by the German investigative magazine c’t, Amazon admitted the mistake, citing human error in sending him the wrong file.

(The statement fails to mention if the company notified the person whose data was shared. Also, Amazon was only compelled to comply with the request for data because the requestor was a European Union citizen. If you’re an American, or from anywhere outside the EU, good luck.)

In case any of the impacts of the story escaped your notice, let’s take a moment to summarize what this all means in simple terms, shall we?

  1. Your Alexa-enabled device listens to you more than you think it does.
  2. Your Alexa-enabled device not only listens to you but also records those sounds.
  3. Your Alexa-enabled device sends those recordings to an Amazon data center, where they not only use natural language processing algorithms to decode your speech and complete your request but also store those files in a centralized database for future use.
  4. At that data center, Amazon ­– one of the best data management companies on the planet ­– has a human process to respond to your data request.
  5. As the investigative reporting shows, this human process is prone to error.

To put it in even simpler terms, if you own an Amazon Alexa-enabled device, Jeff Bezos could be the least creepy person listening to you right now.

Are you okay trading your privacy in your home for a weather report?

Or asked a different way: Is that weather report worth someone at Amazon listening to:

  • an argument with your spouse?
  • your kids playing?
  • a “tough” visit to the bathroom?
  • you and your partner having sex?

Are you okay with a random person (who received your data file by mistake) listening to that? Are you okay with a hacker listening to that? Your health insurance company? The police?

I used to believe this was a “boogieman” issue – that worst-case scenarios like the one described didn’t really happen. I used to believe people who rang the warning bell were at best, premature fools, and at worst, fear-mongering opportunists. I used to believe those things, but I was wrong.

The European Union’s 2018 GDPR consumer protection law cast a light under the bed and showed us all that the boogieman is real. And he’s listening to you right now.

 

The tyranny of menus and why is “voice” such a big deal.

To understand why companies are investing so much in voice recognition technology, and why they risk invading your privacy, you have to understand how objectively poor today’s “digital” experience is and how it got that way.

Voice is the natural way humans interact with others and their environment. But in the early days of the internet, interactive voice technology was neither advanced enough nor cheap enough to use outside of a few advanced laboratories. The most cost-effective voice technologies of the day were “telephone menu tree” systems that infuriated even the most patient callers.

If a “natural” interface wasn’t ready for the birth of the internet, what was the next best alternative?

Cascading menus.

Borrowed from library science, the menu structure is a software engineer’s dream. It’s logical, orderly, and hierarchical. Unfortunately, menus are not how people naturally interact with information. Menus do not mimic how our brains work. Menus are not easy to use.

Menus are terrible user interfaces for most everyday functions.

As just one example, think about this simple use case: I would like to play Prince’s “1999” on my iPhone. Here are the menu-driven steps I can take:

  1. Unlock the home screen (if I have not authorized biometrics, I need to input a passcode).
  2. Tap the iTunes app to open it.
  3. Tap the “Artists” list.
  4. Scroll to “Price” and tap the artist’s name.
  5. Scroll to “1999” and tap the song name.
  6. Adjust the volume as needed.

Six steps. Multiple taps and scrolls. Complex, artificial, robotic.

Or, consider this voice-based alternative:

“Siri, play Prince’s 1999.”

Four words. One voice command step. Simple, natural, intuitive.

Menus are so common, that we almost forget how unnatural they are. Menus don’t only dominate the user interface of smartphones, computers, tablets, and websites, but we find them everywhere – kiosks, airport terminals, medical devices, automobiles, and home appliances.

Think about it: That infuriating menu in your Toyota Camry, your CPAP machine, or your GE refrigerator is an ugly holdover from the early days of GopherNet and ARPANET … just like the QWERTY keyboard is an ugly holdover from the early days of IBM typewriters.

That’s why voice is such a big deal.

Menu interactions may be behavioral (and in many ways superior to opinion-based evidence), but they are still untethered to our true thought processes. Voice interactions are different – and not the type of robotic voice commands you give your car; those are simply audio menus, and they are terrible – no, the true potential of voice is unlocked with Natural Language Processing algorithms that learn to interpret and respond to natural human speech patterns. The best of them are learning our cadence, pitch, tone, accent, and volume ­– and most importantly, our intent.

In a menu-driven world, our devices aren’t listening to us, they are waiting for input. However, when a device is listening, it doesn’t need to wait to respond. It can make suggestions to you in real-time, just as another person would do in a conversation. That’s the quantum leap voice technology promises: For the first time in human history, machines can truly interact with us.

But as we’ve seen, that’s not how people think voice technology works. Because we are so used to machines waiting for our commands, we’re not conscious that many of them are now listening to us go about our daily lives.

 

I’m not sure “voice” can be trusted. Yet.

Contrary to the image created by the advertising of a fully conversational human-computer interface (a la the Star Trek “computer” or “J.A.R.V.I.S.” from Marvel’s Iron Man), if you try to hold a “conversation” today with Alexa, Cortana, Siri, or Google, you will be disappointed.

Most people who use voice technology quickly learn its limitations and adjust their expectations. Most people use Alexa-enabled devices to tell give them weather reports or to play on-demand music. That’s it.

But if voice technology is to improve, its developers need to listen to and analyze many more interactions. Their argument for listening is simple: As consumers get better at interacting with voice technology, the technology will learn and improve. As the technology improves, consumers will expand their use of it. It’s a positive feedback loop that will (eventually) give birth to a real “J.A.R.V.I.S.” And when that happens, you’ll love it.

Perhaps. But until that day comes, you’re giving up your privacy for a weather report.

At this point, it’s fair to argue that we’ve given up our “privacy” for all manner of technological benefits and services. True, but up to this point those technologies operated on your explicit command. No one forces you to use Google Maps. No one forces you to share personal details on Facebook. No one forces you to buy from Amazon.

But a voice is different.

Voice is a form of biometric data – something that is uniquely yours. Additionally, voice technology insidiously invades your privacy, always listening, always recording, and always learning more. You can see why organizations want voice analysis so desperately. It’s finally able to break into your “inner self” versus relying on your opinions or waiting for your command.

Voice technology is the ultimate behavioral study that you didn’t realize was happening.

Here’s the bottom line: Until organizations demonstrate they can be trusted with our private data, I’m not sure they deserve to have us give it to them for free. What’s more, they are unlikely to stop collecting your data on their own. As we’ve discussed, they need that data to improve their voice technology, and you’re willingly giving it to them. Why would they stop? They simply hope you aren’t paying attention.

It’s time that changed.

Here are a few easy things you can do today to start you on the path to reasserting the privacy in your own home:

  • Think hard about whether a voice-enabled device is right for you. That includes products from Amazon, Google, Apple, Microsoft, and others. Honestly, I don’t care if you choose one or not. Just don’t think it’s not listening to you pee. It is.
  • If you do choose to use a voice-enabled device in your home, understand that your home conversations are no longer private. Consider that every statement you make inside the comfort of your home could have the potential to end up in the hands of advertisers, your government, the police, or Google.
  • Think twice about connecting your voice-enabled device to home automation and security systems. “Smart home” technology is a known source of hacks and privacy intrusions.
  • Search out and read privacy statements before you purchase a voice-enabled device. I’m not saying, “don’t buy it,” I am simply saying, “know what you’re buying.”
  • If you happen to live in the European Union, learn how to request your voice data file. It’s easy. Here’s how.
  • If you are in the United States, send a message to your representative and ask for their stand on privacy issues. That’s easy too. Here’s how.
  • I could go on how many other countries. You get the idea. The notable exception is China. They think about privacy differently.
  • Last, but not least, learn how to turn off listening when you don’t want to be heard.

Sorry, tech companies will not protect your privacy out of the goodness of their heart. It is up to you, as the consumer, to take action.

Your voice is yours. Keep it that way.