Hacker News new | past | comments | ask | show | jobs | submit login
Hey, Alexa, What Can You Hear? And What Will You Do With It? (nytimes.com)
200 points by SREinSF on April 2, 2018 | hide | past | favorite | 106 comments



You can't trust the large companies who have a culture of profiting off your data to protect your privacy.

This is why we need AI at the edge, and not in the clouds, and Privacy-by-design thinking in our architectures. This is the only way for people to know their data won't be compromised and misused, because it never leaves their devices.

Disclaimer: I'm a co-founder of https://snips.ai and we are building a 100% on-device and private-by-design platform to build Voice AI assistants

We would be happy to know what you do with it! You can take a look at what some people have built with it already https://github.com/snipsco/awesome-snips

We are open-sourcing it over time, starting with the NLU: https://medium.com/snips-ai/snips-nlu-is-an-open-source-priv.... Snips is available in English, French, German, and soon Japanese and Korean with more European languages coming this year.

You can start building your own private-by-design smart speaker on the platform in under 1h with this tutorial: https://medium.com/snips-ai/building-a-voice-controlled-home...


A bit OT:

What's up with the animated drawing on your front page of an alien poorly disguised as a human female at a picnic using voice to control her music player? It's kind of distracting--I keep staring at it instead of reading the site.

I infer that she's an alien for three reasons.

First, her head is on backwards compared to human heads.

Second, she appears to have only one upper arm, attached to her left shoulder, which forks at the elbow into two lower arms, one of which is going behind her back over to her right side. The forearms are also about twice as long as human forearms.

Third, the neck is way longer than human necks, but quite consistent with many the aliens some people have claimed to have seen.

Seriously...is this some recognized art style? I don't know much about art, but I do recall that around the late 19th/early 20th century there were some prominent artists and styles that took big liberties with human anatomy.

It's well drawn sufficiently (as are the other drawings on your site) that it gives the impression that the artist is referencing something known.


That's a really creepy design.


Ha ツ - thought the same thing when someone shared this site at work

https://twitter.com/whitingx/status/978983203024318464


Looks somewhat like surrealism to me.


> I'm a co-founder of https://snips.ai and we are building a 100% on-device and private-by-design platform to build Voice AI assistants

Excellent! How will you guarantee that your devices will not leak data?


We are open-sourcing the code, and people can have their own network monitoring to see that there is no data leaking


Please check into 'reproducible builds' and ways to ensure the software is not modified between the point where you build it and the point where it gets deployed. There will most likely at some point be an attempt to use your reputation to achieve the exact opposite of your stated goals so beware of that.


We'd love to hear more about what you think of this, perhaps you can contact me mael.primet@snips.ai and I would love to do a Skype with you, if only because I read quite a few of your posts over the year and am curious to know you.

We are providing the source, and people can rebuild it from scratch and ensure that their device is running the open-source code. In theory, it is each user who should deploy his device. If an industrial partners want to ship our software to many devices, indeed it will be needed for them to prove he hasn't modified the software and we are happy to learn about best practices to help them do so, so we will look at this


I'm sorry, I don't do skype or hangouts but I'll be happy to email with you. However, it is probably better if you find someone who is really an expert in the field. But I can definitely see that there is a potential problem here if you are open sourcing your code and industrial partners ship your software without some kind of clear way in which you can ascertain that what got shipped is what you intended and not some kind of perverted variation.

This might be a good starting point:

https://reproducible-builds.org/


> If an industrial partners want to ship our software to many devices, indeed it will be needed for them to prove he hasn't modified the software and we are happy to learn about best practices to help them do so, so we will look at this

Yea, as a privacy conscious user this would be the attack vector that I’d be most nervous about. “Partners” taking the open source engine and loading it up with surveillance. Being able to re-compile and deploy a vanilla version from source is critical. There are a lot of “smart” devices out there that I would buy if only I had root and/or could decide what software ran on it.


Will the device have any components in it which require proprietary firmware? Think long and hard about that before answering.

edit: I thought you were also doing something similar to Mycroft and their 'mark 1' thing. I was wrong.


Our platform doesn't require any specific component.

It will be open-source and free for all users to use and create assistants.

If an industrial device maker wants to use the open-source version and add to it proprietary components, it will be his decision and he should inform his customers.

The best we can do when building and open-sourcing our assistant platform is to make it possible for commercial vendors to create state-of-the-art private-by-design voice assistants if they want


I spent like 60 seconds on your page trying to figure out what exactly you do and failed. Just saying.

Put some of the information of your posts here on your page and things would be much clearer.


So no components in your device (e.g. for wifi) require proprietary firmware to function? Are you sure about that?


They make software, not hardware, they could not make such a guarantee.


Ah, I didn't realize that. I thought they were trying to do something like Mycroft with the Mark 1. Mycroft releases their platform to be installed on other devices, but they also make their own (Mark 1).


"and people can have their own network monitoring"

Is this monitoring built into your product? If not, wouldn't a user of any AI assistant have this same ability?


The monitoring is to see if the product is making network connections, you can do that with any AI assistant, yes, but the Google and Amazon products are going to require you to allow them to make those connections. This claim is that you can see that this product isn't making connections. (I think)


You can use a tool like `mitmproxy` to monitor the network requests


Have you run into any problems improving or training your algorithm if you're limiting the data it has access to? What are some of the things you've done that make you "wish" you could be an online off-device tool?

I ask because most of what we hear is that online tools will get better over time because you are sharing your data. How do you work around this, or is that even an obstacle?


There are many ways to work around this, we do data augmentation and data generation to create more data


Although the user who complained about your using HN to post about this broke the guidelines by being uncivil, they do appear to have a point: it looks like you've been using HN exclusively to promote your own stuff. That's not what the site is for.

There's nothing wrong with posting links to your own work when it's relevant, but please don't use HN only for this. The site is intended for intellectual curiosity, i.e. submitting, reading, and commenting on stories that you run across and personally find interesting. So if you use it exclusively for promotion, you're not really participating in the community as intended. Does that make sense?


Oh, this looks cool. I was looking into building something with a respeaker not too long ago. How does your business model work that you're letting people DIY for free?


We are selling licenses for industrial builders.

If you build something with it, we would love to hear about it and feature it on our website! Come talk to us on our Discord channel :)


Any reasons to choose Discord over Slack?

Wondering as we have to make a similar decision for our startup.


A USB microphone... That's a bummer. I was hoping that Zigbee or Z-wave based mics already exist. Do they? Every time I look at the landscape of home automation it's really hard to avoid clouds and Wifi.


I had problems setting up Snips on an RPI - couldn't figure out how to read the MQTT bus messages after I set up the audio successfully. Where do I go for help?


Hi! You can either send us an issue on Github https://github.com/snipsco/snips-platform-documentation/issu... or better come talk to us on our Discord channel: https://discordapp.com/invite/3939Kqx


what's your business model if everyone can download and build their own image, or even re-distribute that in their own products?

as far as Alexa goes, I unplugged it a few months ago and it's collecting dust now, I use my cellphone to check weather and for alarm clock instead, I don't see much use for Alexa other than those two simple tasks.


> far as Alexa goes, I unplugged it a few months ago and it's collecting dust now

Ha, I got a google home mini free as part of some promotion, and I haven't even broken the shrink wrap. I need to find a way to sell that thing for a few bucks before it becomes outdated, because I sure have no interest in it.


We are selling licenses for industrial makers


any plans to integrate with https://www.home-assistant.io/ ?


We have an integration with Home Assistant and are now working on making this even easier, stay tuned and subscribe to our newsletter to be informed when the integration is complete


"Your ideas are intriguing to me, and I wish to subscribe to your newsletter."



> I'm a co-founder of https://snips.ai

Yes, we know. You post to literally every voice-assistant related comment thread. If you aren't a bot, you must have a (voice powered?) bot sniffing out threads to post on.

If you're going to do that, please at least learn the difference between a disclaimer and a disclosure. :sigh:


The things people are willing to give up for the most minimal of conveniences are only going to get worse in the next decade.

I wouldn't be willing to put an open wiretap in my home, even if it did something amazing, like extend my lifespan. The quality of my life is not significantly improved with these devices, and all of their practical uses can be duplicated with the minimal effort of tapping a screen a couple of times.


> I wouldn't be willing to put an open wiretap in my home... all of their practical uses can be duplicated with the minimal effort of tapping a screen a couple of times.

I've never understood this argument. Why is the portable wiretap in your pocket inherently safer?


Not the Parent commenter, but I can give some thoughts:

- The phone is not designed to listen around the room explicitly, more near field, so it can't hear everything.

- An iPhone seems to have pretty decent privacy settings (only the app can listen while app is open, etc)

- I personally have a phone I can reflash with mostly open source software (only baseband/fivers proprietary), so I am reasonably sure that my phone is not spying on me.


It may not be designed to "listen around the room explicitly", but my phone makes a decent speaker phone, so it still has that capability. And to be honest, I'd be less concerned with what my Echo hears in my livingroom compared with what my phone hears all day long.

The Echo has pretty decent privacy settings (only listens when you say the "wake word", etc).


I don't disagree with phone vs stationary speaker in your house, but the "wake word" issue is precisely what the article deals with.


The article is talking about patents filed that may or may not ever end up in products, not current technology.

But any wider use of audio processing to listen for keywords besides the wake word apply equally well to a cell phone. So again, the cell phone is at least as big a concern as an Echo, if not bigger since many people are rarely outside of arms-reach of their phone.


Why are you so sure the baseband isn’t spying on you?


Frankly, I am not 100% sure it isn't, I am just reasonably sure. I think to do that is beyond the capability of the type of attacker I am worried about. Google/Amazon/Facebook/name your favorite commercial spying company won't exploit it, as they have their apps to do that sort of stuff. I don't think anyone else with that capability (i.e. name your favorite "spooky" government agency) is coming after me. Hence, reasonably sure.


>Why is the portable wiretap in your pocket inherently safer?

Who says it is? Its astounding to me that people willingly carry around portable (wi-fi enabled!) wiretaps.


Not safer, but not there is a key difference.

If I turn off voice assistant on my phone, ethically speaking, it shouldn't be recording my voice. Companies/governments doing so is an actual wire tap, and they should have a warrant.

Putting these devices in your home is giving Amazon/Google/Apple permission to record your every conversation.


That's not true. Those devices are designed on the same ethical grounds you claimed for the phone case. They're only supposed to listen after you invoke them. Regardless if this is true or not, a phone is millions of times more dangerous for privacy if it's being used to record you and track you.


> They're only supposed to listen after you invoke them.

How can a device that is only activated by listening not always be listening? If I turn off Siri on my phone the microphone is supposed to go off.

Edit: I know about the wake word. It was a statement on the quote that a device that only works on voice wouldn't be "listening".


Wake word detection (listening for a specific phrase, such as "Hey Siri") typically happens using a completely different subsystem from active voice recognition, since it's a much more bounded problem that also needs to run with far lower power consumption. (in the case of Siri specifically, Apple actually has a pretty nifty whitepaper: https://machinelearning.apple.com/2017/10/01/hey-siri.html)

That said: right now, common sense would dictate that your phone's battery really can't handle recording full audio at all times, but once that's no longer the case, it does seem problematic that we don't yet have the audio equivalent of putting a piece of tape over your laptop webcam.


Remote activation of the microphone, wake word or not.

https://www.cnet.com/news/fbi-taps-cell-phone-mic-as-eavesdr...

I don't know how easy it is with today's smartphones, but it was definitely a thing in the dumbphone era.

Similar issues with your laptop or these smart speaker devices.


It's always listening for the keyword. The listening isn't problematic, it's the sharing.


I don't have Alexa (though I assume it has the same), but you definitely can turn off mic on Google Home.


You're living in a fantasy world if you think that fiddling with the settings deters, in the slightest, any corporation or government from using your portable-wiretapping device to listen to you and track everything else its technically capable of tracking (with its gyroscope and internal measurement devices of all kinds).


I'm seriously shocked by the number of people that are putting always on "assistants" in their home - but more so on the number of people that should know better, like HN people.


One of my friends has one of these because she likes the way it helps her build a grocery shopping list over the week. She once thought of herself as a privacy activist, and is fully conscious of the hypocrisy. It really made me think twice about how small the convenience has to be for someone to compromise their privacy.


What's shocking about it? For just the Echo, Amazon claims to have sold 20 million devices so they are probably in at least 10 million homes. How many of the people that buy an Echo have been harmed so far?

I have an Echo and use it daily for music, weather, timers, home automation (ie controlling lights and fans), etc...

I also drive a car almost every day and that has a relatively high probability of killing me. Compared to that, having an Echo in my house seems somewhat harmless.


When the German government started gathering data about religion of its citizens, way before WW2, how many people were harmed by filling out that one field on the form? Zero. No one was harmed by the collection of that data. Except that then WW2 arrived and this data was used to exterminate entire communities. In my own country(Poland) the secret police used to gather all data on people and do nothing with it - until you gave them a reason to, then they had entire archives of recorded conversations, intercepted letters, logs of visits and journeys, ready to be browsed in search for anything that could be used against you.


>How many of the people that buy an Echo have been harmed so far?

That entirely depends on your definition of "harm". Those of us who see intrinsic value in privacy would be grievously harmed.


I don't think there's any way to tell if there has been concrete harm, which is the biggest problem. You can't trust the platform.


I think the definition of harm has to include some negative consequence. Otherwise you could just say "those of us that are offended by the color blue are grievously harmed every time the Echo is activated".

Even if we accept that sharing anything private is harm, I'm guessing most purchasers of the Echo understand that the device uses the internet to answer queries. How are those foolish people being harmed?


As a previous commenter said -- why are you not similarly shocked by the number of HN users that carry a remote recording (audio + video) device + GPS tracker in their pocket?

Why should I be less concerned about a mobile device that waits for me to say "Ok Google" or "Siri" than a stationary device that waits for me to say "Alexa"?


Even if you know better and act like it, you're still going to be monitored while visiting friends.


Damn. I would. I probably did anyways by carrying my cell phone with me everywhere.

The problem is not understanding the tradeoffs or even not understanding that there's a choice at all.


In past discussions, I was led to believe we were safe from these devices, in terms of pervasive data collection, because the main processor that was always running was primitive and was only powerful enough to pick up a key phrase. This phrase would power the main processor which would then listen to your query, respond, and then deactivate. But now, as the article mentions, I realize that this initial processor could be made to listen for many key words like "love", "hate", or other words that would pick up on sensitive personal information. I really don't think these devices should exist.


> because the main processor that was always running was primitive and was only powerful enough to pick up a key phrase

That's only true if you're trying to do complex analysis on the device. Cell phones in the early 1990s had even less computing power, but it was enough to encode speech down to 13.2 kbit/s[1] (or 5.6 kbit/s[2]). A simple noise gate[3] would reduce recording duty cycle down to maybe 1% while costing a trivial amount of CPU load.

Modern hardware - even tiny embedded devices - can probably do a lot more than simple gat4ed+compressed audio.

[1] https://en.wikipedia.org/wiki/Full_Rate

[2] https://en.wikipedia.org/wiki/Half_Rate

[3] https://en.wikipedia.org/wiki/Noise_gate


So let's assume this is technically possible, and only examine the process under the lens of detectability. Assuming the codec is 8kbps, you'd see roughly 3MB upload for every hour of audio, and I would say it's not out of the question for the noise gate to be active from the television being on in the same room, or music playing. For an utterance, seeing a 3MB upstream would be super abnormal. It would be impossible to transmit all of this data for processing without somebody noticing.


While a noise gate is an obvious starting point, I suspect that much smarter filters are possible. A wide range of design choices are possible that trade CPU <-> complexity <-> accuracy. If a high error rate (including "no data") is acceptable, a filter might simply default to "off" when any unusual condition like "noise gate has been on continuously for 15 minutes" is detected.

If I was designing this kind of spyware, I would put a hard limit that cuts off uploading any "extra" data after sending some low multiple of the "legitimate" data.


Why would it record continiously? If I were developing such a device I'd use random sampling. Don't have use for all the audio in every home, but a little bit from everywhere can be useful. Can be used to improve the acoustic model of the room maybe, or improve targetting of ads[1]. Some snippets from the minutes before and activation could be especially interesting, to understand the context.

1. Or more nefarious things, of course.


Or record audio, do speech to text, some light analysis, and upload the metadata. Or just upload a basket of words and their frequency, "Toilet Paper, 2, tv, 7, online, 4,"


Detecting who speaks is also a (relatively) lightweight analysis. Combined with bag-of-words can build personality/interest profile. Speech based gender detection can also be done, probably also detecting kids from adults. Now have good data about demographics of the household.


> including an “algorithmic transparency requirement” that would help people understand how their data was being used and what automated decisions were then being made about them.

This needs to be required for any type of algorithmic decision making. Without algorithmic transparency ulterior motives, intentional or unintentional biases, and unnoticed mistakes are hidden from public review.

A common response is that we don't know how some types of machine learning make their decisions. I agree this is occasionally true. Find a way generate an explanation, or use a different algorithm; transparency is a critical requirement.


Companies use patents defensively and the Times is fully aware of that so this comes across as cynical clickbait.

What I find interesting is that they keep quoting "consumer watchdog" an anti google organization turned anti technology (they're against self driving cars and robots now).

The scary thing is that mainstream respected news outlets casually traffic in technophobia as evident form articles about automation and AI for example and how every piece mentioning a tech company is permeated with FUD about their motivations or their "power".


That is a lot of buzzwords.


Why don't we have open-source voice-assistants yet? I mean, if we can have an open-source OS (e.g. Linux), then surely we can have open-source speech recognition, right?


We are building this at https://snips.ai (disclaimer: I'm a co-founder). You can build 100% on-device Voice AI assistants which are running on a Raspberry Pi 3, and we are open-sourcing the platform

You can take a look at our blog to get started if you want to build your own assistant: https://medium.com/snips-ai/building-a-voice-controlled-home...


Interesting!

Under what license are you open-sourcing the platform?

And what is your business model?

How well do you perform in benchmarks? (Assuming there are benchmarks)


It will be mostly GPL, the business model is to sell licenses for industrial device builders!


Voice assistants are difficult, both electro-acoustically to get a good clean voice signal from the ambient room noise and voice-assistant generated audio (this is called barge-in), as well as the software to actually parse that speech. A great open-source voice-assistant is ambitious.

But why isn't there a great open-source version of any $iot_appliance? IOT, in general, is crawling with companies that are looking for a monthly cash flow, and are therefore creating an ecosystem of non-federated, walled garden type devices that report back to some central server to keep the user dependent upon and paying monthly cash to some company. All of these IOT projects are generally simpler than voice-assistants, and I look forward to the day when we have IOT projects that talk to each other or a local server, without linking us to some company forever.


> A great open-source voice-assistant is ambitious.

Yes, I get that. But so was writing the Linux kernel.


Linux kernel is the outlier, not the norm. You can’t take a regular runner and ask him why he isn’t running as fast as Usain Bolt.


Yes, but we only need one Usain Bolt to push the boundary of running.


Moreso. If you have one Usain Bolt, he can do stuff no one else can, but having accomplished it, he can't share his ability. If you have a software developer with a Usain Bolt-level of skill, he can share what he creates.



In addition to snips, there is also mycroft.ai. Both are still in the early stages, and work slightly differently. Check them both out.


There's also Sirius by University of Michigan.

Its also completely open source, and can do text, voice, and picture search.


one more I wasn't aware of. We will see which gets anywhere.


Mozilla is making one. The first release last year was already very good, and hopefully it will improve with time. https://hacks.mozilla.org/2017/11/a-journey-to-10-word-error...


Not exactly. Mozilla is making a good speech to text engine. (the current open source engines are based on some old academic work that is now considered not the right way forward).


Well the second part of amelius's question asked about speech recognition.


With a court order can authorities turn on the microphone and just listen to everything? Seems easy to do but I haven't heard about it yet. I guess phones can do the same. Presumably the Russians and Chinese do this already. :)


A warrant is required only as long as "voice assistant" technology "is not in general public use"[1]. Kyllo v United States created a bright-line[3] test that removes the warrant requirement to see "details of a private home that would previously have been unknowable without physical intrusion"[2] when use of the technology is normalized.

Note that this removes the warrant requirement in general, even if you personally don't own an Alexa/etc. The test is if the public expects audio in the home might be recorded and sent to a 3rd party. If the answer is "yes", then the police can use their own hardware to do the recording.

[1] http://caselaw.findlaw.com/us-supreme-court/533/27.html

[2] Ibid.

[3] https://en.wikipedia.org/wiki/Bright-line_rule


> The test is if the public expects audio in the home might be recorded and sent to a 3rd party. If the answer is "yes", then the police can use their own hardware to do the recording.

Honest question, are you an attorney or can one comment? I am not, but I don't read this opinion the same way.

The case referenced concerns using thermal imaging from outside a home to look inside it, which I guess makes sense then that if thermal imaging was super common maybe they would argue that its not a big deal to point one at a house by a cop, but thermal imaging is not common so it was ruled an unconstitutional search.

But you seem to have taken that as the police being able to enter a residence and install a transmitting microphone just because most people have transmitting microphones inside the house. There's a huge difference in a search that requires entering a home and one that does not.

SCOTUS actually seems pretty explicitly concerned in the text about technology eroding the expectation of privacy by the progressing deeper into a private residence without having to enter it, so that again implies this case was about technology being used to search a home without entering it.


Courts tend to look at precedent. Any lawyer taking a case in this area will look at the case in question and try to twist it to state their side.

This isn't to say courts will rule one way to the other, or that the courts won't change their mind, but a well reasoned opinion form a different court is powerful.


Amazon resisted a warrant from a prosecutor seeking all Alexa recordings from a house that was the site of murder. Trial in Arkansas I believe.

I don't think it ever went anywhere precedent setting however because the defendant eventually gave permission to Amazon to turn it over.

Still pretty muddy waters. I would trust Amazon a little more than I would trust the smart tv makers, but that's just my bias.


You're right. They publicly resisted. But I have no qualms that they can also just quietly hand over a DVD to a prosecuted and say " here ya go".

In the end, this is their platform of which we are sharecroppers.


> I would trust Amazon

Note that re: Kyllo v United States, trust in any specific manufacturer is irrelevant. The SCOTUS ruled that use of a technology - not a product - requires a warrant when it is "is not in general public use".

The Kyllo case involved two federal agents that used their own thermal imaging camera to search a house for the presence of people and grow lights.


Glancing at the supreme court case, I'm not sure that observations that police officers can do from a public place have a good application to this area.

I think the public expectation part was specifically whether thermal imaging was too invasive and not an observation that the general public would make from the street.


A warrant or a hack. Don't trust the manufacturers to be able to secure these devices.


Why would you possibly think you'll hear about times governments forced companies to turn these into wiretaps? You won't until you do - but then you'll know it was going on for years.


If you're not planning to release your own hardware, could you provide a list of devices that the software is tested with? I looked on the site a bit but the crazy art thing triggered my ADHD and OCD as well. I imagine the idea is that you can compile it for whatever device on any platform with some change but makes it harder for community engagement without a go to for sure platform to test some builds and POCs.


One application details how audio monitoring could help detect that a child is engaging in “mischief” at home by first using speech patterns and pitch to identify a child’s presence, one filing said. A device could then try to sense movement while listening for whispers or silence, and even program a smart speaker to “provide a verbal warning.”

This worries me. First the children are watched 24/7, them the adults.


I had a dream that Alexa recognized my voice when I was visiting a friend's house. Won't be long for that one.


Your phone already knows you visited your friend's house, why are you worried about Alexa knowing too?


That's funny.

I wonder how human intrigues (and the literature/narratives around it) will be affected by ubiquitous computing and networking.


Alexa could enable subtle insider trading by both Amazon employees and Amazon AIs.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: