You can't trust the large companies who have a culture of profiting off your data to protect your privacy.
This is why we need AI at the edge, and not in the clouds, and Privacy-by-design thinking in our architectures. This is the only way for people to know their data won't be compromised and misused, because it never leaves their devices.
Disclaimer: I'm a co-founder of https://snips.ai and we are building a 100% on-device and private-by-design platform to build Voice AI assistants
What's up with the animated drawing on your front page of an alien poorly disguised as a human female at a picnic using voice to control her music player? It's kind of distracting--I keep staring at it instead of reading the site.
I infer that she's an alien for three reasons.
First, her head is on backwards compared to human heads.
Second, she appears to have only one upper arm, attached to her left shoulder, which forks at the elbow into two lower arms, one of which is going behind her back over to her right side. The forearms are also about twice as long as human forearms.
Third, the neck is way longer than human necks, but quite consistent with many the aliens some people have claimed to have seen.
Seriously...is this some recognized art style? I don't know much about art, but I do recall that around the late 19th/early 20th century there were some prominent artists and styles that took big liberties with human anatomy.
It's well drawn sufficiently (as are the other drawings on your site) that it gives the impression that the artist is referencing something known.
Please check into 'reproducible builds' and ways to ensure the software is not modified between the point where you build it and the point where it gets deployed. There will most likely at some point be an attempt to use your reputation to achieve the exact opposite of your stated goals so beware of that.
We'd love to hear more about what you think of this, perhaps you can contact me mael.primet@snips.ai and I would love to do a Skype with you, if only because I read quite a few of your posts over the year and am curious to know you.
We are providing the source, and people can rebuild it from scratch and ensure that their device is running the open-source code. In theory, it is each user who should deploy his device. If an industrial partners want to ship our software to many devices, indeed it will be needed for them to prove he hasn't modified the software and we are happy to learn about best practices to help them do so, so we will look at this
I'm sorry, I don't do skype or hangouts but I'll be happy to email with you. However, it is probably better if you find someone who is really an expert in the field. But I can definitely see that there is a potential problem here if you are open sourcing your code and industrial partners ship your software without some kind of clear way in which you can ascertain that what got shipped is what you intended and not some kind of perverted variation.
> If an industrial partners want to ship our software to many devices, indeed it will be needed for them to prove he hasn't modified the software and we are happy to learn about best practices to help them do so, so we will look at this
Yea, as a privacy conscious user this would be the attack vector that I’d be most nervous about. “Partners” taking the open source engine and loading it up with surveillance. Being able to re-compile and deploy a vanilla version from source is critical. There are a lot of “smart” devices out there that I would buy if only I had root and/or could decide what software ran on it.
Our platform doesn't require any specific component.
It will be open-source and free for all users to use and create assistants.
If an industrial device maker wants to use the open-source version and add to it proprietary components, it will be his decision and he should inform his customers.
The best we can do when building and open-sourcing our assistant platform is to make it possible for commercial vendors to create state-of-the-art private-by-design voice assistants if they want
Ah, I didn't realize that. I thought they were trying to do something like Mycroft with the Mark 1. Mycroft releases their platform to be installed on other devices, but they also make their own (Mark 1).
The monitoring is to see if the product is making network connections, you can do that with any AI assistant, yes, but the Google and Amazon products are going to require you to allow them to make those connections. This claim is that you can see that this product isn't making connections. (I think)
Have you run into any problems improving or training your algorithm if you're limiting the data it has access to? What are some of the things you've done that make you "wish" you could be an online off-device tool?
I ask because most of what we hear is that online tools will get better over time because you are sharing your data. How do you work around this, or is that even an obstacle?
Although the user who complained about your using HN to post about this broke the guidelines by being uncivil, they do appear to have a point: it looks like you've been using HN exclusively to promote your own stuff. That's not what the site is for.
There's nothing wrong with posting links to your own work when it's relevant, but please don't use HN only for this. The site is intended for intellectual curiosity, i.e. submitting, reading, and commenting on stories that you run across and personally find interesting. So if you use it exclusively for promotion, you're not really participating in the community as intended. Does that make sense?
Oh, this looks cool. I was looking into building something with a respeaker not too long ago. How does your business model work that you're letting people DIY for free?
A USB microphone... That's a bummer. I was hoping that Zigbee or Z-wave based mics already exist. Do they? Every time I look at the landscape of home automation it's really hard to avoid clouds and Wifi.
I had problems setting up Snips on an RPI - couldn't figure out how to read the MQTT bus messages after I set up the audio successfully. Where do I go for help?
what's your business model if everyone can download and build their own image, or even re-distribute that in their own products?
as far as Alexa goes, I unplugged it a few months ago and it's collecting dust now, I use my cellphone to check weather and for alarm clock instead, I don't see much use for Alexa other than those two simple tasks.
> far as Alexa goes, I unplugged it a few months ago and it's collecting dust now
Ha, I got a google home mini free as part of some promotion, and I haven't even broken the shrink wrap. I need to find a way to sell that thing for a few bucks before it becomes outdated, because I sure have no interest in it.
We have an integration with Home Assistant and are now working on making this even easier, stay tuned and subscribe to our newsletter to be informed when the integration is complete
Yes, we know. You post to literally every voice-assistant related comment thread. If you aren't a bot, you must have a (voice powered?) bot sniffing out threads to post on.
If you're going to do that, please at least learn the difference between a disclaimer and a disclosure. :sigh:
The things people are willing to give up for the most minimal of conveniences are only going to get worse in the next decade.
I wouldn't be willing to put an open wiretap in my home, even if it did something amazing, like extend my lifespan. The quality of my life is not significantly improved with these devices, and all of their practical uses can be duplicated with the minimal effort of tapping a screen a couple of times.
> I wouldn't be willing to put an open wiretap in my home... all of their practical uses can be duplicated with the minimal effort of tapping a screen a couple of times.
I've never understood this argument. Why is the portable wiretap in your pocket inherently safer?
Not the Parent commenter, but I can give some thoughts:
- The phone is not designed to listen around the room explicitly, more near field, so it can't hear everything.
- An iPhone seems to have pretty decent privacy settings (only the app can listen while app is open, etc)
- I personally have a phone I can reflash with mostly open source software (only baseband/fivers proprietary), so I am reasonably sure that my phone is not spying on me.
It may not be designed to "listen around the room explicitly", but my phone makes a decent speaker phone, so it still has that capability. And to be honest, I'd be less concerned with what my Echo hears in my livingroom compared with what my phone hears all day long.
The Echo has pretty decent privacy settings (only listens when you say the "wake word", etc).
The article is talking about patents filed that may or may not ever end up in products, not current technology.
But any wider use of audio processing to listen for keywords besides the wake word apply equally well to a cell phone. So again, the cell phone is at least as big a concern as an Echo, if not bigger since many people are rarely outside of arms-reach of their phone.
Frankly, I am not 100% sure it isn't, I am just reasonably sure. I think to do that is beyond the capability of the type of attacker I am worried about. Google/Amazon/Facebook/name your favorite commercial spying company won't exploit it, as they have their apps to do that sort of stuff. I don't think anyone else with that capability (i.e. name your favorite "spooky" government agency) is coming after me. Hence, reasonably sure.
If I turn off voice assistant on my phone, ethically speaking, it shouldn't be recording my voice. Companies/governments doing so is an actual wire tap, and they should have a warrant.
Putting these devices in your home is giving Amazon/Google/Apple permission to record your every conversation.
That's not true. Those devices are designed on the same ethical grounds you claimed for the phone case. They're only supposed to listen after you invoke them.
Regardless if this is true or not, a phone is millions of times more dangerous for privacy if it's being used to record you and track you.
Wake word detection (listening for a specific phrase, such as "Hey Siri") typically happens using a completely different subsystem from active voice recognition, since it's a much more bounded problem that also needs to run with far lower power consumption. (in the case of Siri specifically, Apple actually has a pretty nifty whitepaper: https://machinelearning.apple.com/2017/10/01/hey-siri.html)
That said: right now, common sense would dictate that your phone's battery really can't handle recording full audio at all times, but once that's no longer the case, it does seem problematic that we don't yet have the audio equivalent of putting a piece of tape over your laptop webcam.
You're living in a fantasy world if you think that fiddling with the settings deters, in the slightest, any corporation or government from using your portable-wiretapping device to listen to you and track everything else its technically capable of tracking (with its gyroscope and internal measurement devices of all kinds).
I'm seriously shocked by the number of people that are putting always on "assistants" in their home - but more so on the number of people that should know better, like HN people.
One of my friends has one of these because she likes the way it helps her build a grocery shopping list over the week. She once thought of herself as a privacy activist, and is fully conscious of the hypocrisy. It really made me think twice about how small the convenience has to be for someone to compromise their privacy.
What's shocking about it? For just the Echo, Amazon claims to have sold 20 million devices so they are probably in at least 10 million homes. How many of the people that buy an Echo have been harmed so far?
I have an Echo and use it daily for music, weather, timers, home automation (ie controlling lights and fans), etc...
I also drive a car almost every day and that has a relatively high probability of killing me. Compared to that, having an Echo in my house seems somewhat harmless.
When the German government started gathering data about religion of its citizens, way before WW2, how many people were harmed by filling out that one field on the form? Zero. No one was harmed by the collection of that data. Except that then WW2 arrived and this data was used to exterminate entire communities. In my own country(Poland) the secret police used to gather all data on people and do nothing with it - until you gave them a reason to, then they had entire archives of recorded conversations, intercepted letters, logs of visits and journeys, ready to be browsed in search for anything that could be used against you.
I think the definition of harm has to include some negative consequence. Otherwise you could just say "those of us that are offended by the color blue are grievously harmed every time the Echo is activated".
Even if we accept that sharing anything private is harm, I'm guessing most purchasers of the Echo understand that the device uses the internet to answer queries. How are those foolish people being harmed?
As a previous commenter said -- why are you not similarly shocked by the number of HN users that carry a remote recording (audio + video) device + GPS tracker in their pocket?
Why should I be less concerned about a mobile device that waits for me to say "Ok Google" or "Siri" than a stationary device that waits for me to say "Alexa"?
In past discussions, I was led to believe we were safe from these devices, in terms of pervasive data collection, because the main processor that was always running was primitive and was only powerful enough to pick up a key phrase. This phrase would power the main processor which would then listen to your query, respond, and then deactivate. But now, as the article mentions, I realize that this initial processor could be made to listen for many key words like "love", "hate", or other words that would pick up on sensitive personal information. I really don't think these devices should exist.
> because the main processor that was always running was primitive and was only powerful enough to pick up a key phrase
That's only true if you're trying to do complex analysis on the device. Cell phones in the early 1990s had even less computing power, but it was enough to encode speech down to 13.2 kbit/s[1] (or 5.6 kbit/s[2]). A simple noise gate[3] would reduce recording duty cycle down to maybe 1% while costing a trivial amount of CPU load.
Modern hardware - even tiny embedded devices - can probably do a lot more than simple gat4ed+compressed audio.
So let's assume this is technically possible, and only examine the process under the lens of detectability. Assuming the codec is 8kbps, you'd see roughly 3MB upload for every hour of audio, and I would say it's not out of the question for the noise gate to be active from the television being on in the same room, or music playing. For an utterance, seeing a 3MB upstream would be super abnormal. It would be impossible to transmit all of this data for processing without somebody noticing.
While a noise gate is an obvious starting point, I suspect that much smarter filters are possible. A wide range of design choices are possible that trade CPU <-> complexity <-> accuracy. If a high error rate (including "no data") is acceptable, a filter might simply default to "off" when any unusual condition like "noise gate has been on continuously for 15 minutes" is detected.
If I was designing this kind of spyware, I would put a hard limit that cuts off uploading any "extra" data after sending some low multiple of the "legitimate" data.
Why would it record continiously? If I were developing such a device I'd use random sampling. Don't have use for all the audio in every home, but a little bit from everywhere can be useful. Can be used to improve the acoustic model of the room maybe, or improve targetting of ads[1]. Some snippets from the minutes before and activation could be especially interesting, to understand the context.
Or record audio, do speech to text, some light analysis, and upload the metadata. Or just upload a basket of words and their frequency, "Toilet Paper, 2, tv, 7, online, 4,"
Detecting who speaks is also a (relatively) lightweight analysis. Combined with bag-of-words can build personality/interest profile.
Speech based gender detection can also be done, probably also detecting kids from adults. Now have good data about demographics of the household.
> including an “algorithmic transparency requirement” that would help people understand how their data was being used and what automated decisions were then being made about them.
This needs to be required for any type of algorithmic decision making. Without algorithmic transparency ulterior motives, intentional or unintentional biases, and unnoticed mistakes are hidden from public review.
A common response is that we don't know how some types of machine learning make their decisions. I agree this is occasionally true. Find a way generate an explanation, or use a different algorithm; transparency is a critical requirement.
Companies use patents defensively and the Times is fully aware of that so this comes across as cynical clickbait.
What I find interesting is that they keep quoting "consumer watchdog" an anti google organization turned anti technology (they're against self driving cars and robots now).
The scary thing is that mainstream respected news outlets casually traffic in technophobia as evident form articles about automation and AI for example and how every piece mentioning a tech company is permeated with FUD about their motivations or their "power".
Why don't we have open-source voice-assistants yet? I mean, if we can have an open-source OS (e.g. Linux), then surely we can have open-source speech recognition, right?
We are building this at https://snips.ai (disclaimer: I'm a co-founder). You can build 100% on-device Voice AI assistants which are running on a Raspberry Pi 3, and we are open-sourcing the platform
Voice assistants are difficult, both electro-acoustically to get a good clean voice signal from the ambient room noise and voice-assistant generated audio (this is called barge-in), as well as the software to actually parse that speech. A great open-source voice-assistant is ambitious.
But why isn't there a great open-source version of any $iot_appliance? IOT, in general, is crawling with companies that are looking for a monthly cash flow, and are therefore creating an ecosystem of non-federated, walled garden type devices that report back to some central server to keep the user dependent upon and paying monthly cash to some company. All of these IOT projects are generally simpler than voice-assistants, and I look forward to the day when we have IOT projects that talk to each other or a local server, without linking us to some company forever.
Moreso. If you have one Usain Bolt, he can do stuff no one else can, but having accomplished it, he can't share his ability. If you have a software developer with a Usain Bolt-level of skill, he can share what he creates.
Not exactly. Mozilla is making a good speech to text engine. (the current open source engines are based on some old academic work that is now considered not the right way forward).
With a court order can authorities turn on the microphone and just listen to everything? Seems easy to do but I haven't heard about it yet. I guess phones can do the same. Presumably the Russians and Chinese do this already. :)
A warrant is required only as long as "voice assistant" technology "is not in general public use"[1]. Kyllo v United States created a bright-line[3] test that removes the warrant requirement to see "details of a private home that would previously have been unknowable without physical intrusion"[2] when use of the technology is normalized.
Note that this removes the warrant requirement in general, even if you personally don't own an Alexa/etc. The test is if the public expects audio in the home might be recorded and sent to a 3rd party. If the answer is "yes", then the police can use their own hardware to do the recording.
> The test is if the public expects audio in the home might be recorded and sent to a 3rd party. If the answer is "yes", then the police can use their own hardware to do the recording.
Honest question, are you an attorney or can one comment? I am not, but I don't read this opinion the same way.
The case referenced concerns using thermal imaging from outside a home to look inside it, which I guess makes sense then that if thermal imaging was super common maybe they would argue that its not a big deal to point one at a house by a cop, but thermal imaging is not common so it was ruled an unconstitutional search.
But you seem to have taken that as the police being able to enter a residence and install a transmitting microphone just because most people have transmitting microphones inside the house. There's a huge difference in a search that requires entering a home and one that does not.
SCOTUS actually seems pretty explicitly concerned in the text about technology eroding the expectation of privacy by the progressing deeper into a private residence without having to enter it, so that again implies this case was about technology being used to search a home without entering it.
Courts tend to look at precedent. Any lawyer taking a case in this area will look at the case in question and try to twist it to state their side.
This isn't to say courts will rule one way to the other, or that the courts won't change their mind, but a well reasoned opinion form a different court is powerful.
Note that re: Kyllo v United States, trust in any specific manufacturer is irrelevant. The SCOTUS ruled that use of a technology - not a product - requires a warrant when it is "is not in general public use".
The Kyllo case involved two federal agents that used their own thermal imaging camera to search a house for the presence of people and grow lights.
Glancing at the supreme court case, I'm not sure that observations that police officers can do from a public place have a good application to this area.
I think the public expectation part was specifically whether thermal imaging was too invasive and not an observation that the general public would make from the street.
Why would you possibly think you'll hear about times governments forced companies to turn these into wiretaps? You won't until you do - but then you'll know it was going on for years.
If you're not planning to release your own hardware, could you provide a list of devices that the software is tested with? I looked on the site a bit but the crazy art thing triggered my ADHD and OCD as well. I imagine the idea is that you can compile it for whatever device on any platform with some change but makes it harder for community engagement without a go to for sure platform to test some builds and POCs.
One application details how audio monitoring could help detect that a child is engaging in “mischief” at home by first using speech patterns and pitch to identify a child’s presence, one filing said. A device could then try to sense movement while listening for whispers or silence, and even program a smart speaker to “provide a verbal warning.”
This worries me. First the children are watched 24/7, them the adults.
This is why we need AI at the edge, and not in the clouds, and Privacy-by-design thinking in our architectures. This is the only way for people to know their data won't be compromised and misused, because it never leaves their devices.
Disclaimer: I'm a co-founder of https://snips.ai and we are building a 100% on-device and private-by-design platform to build Voice AI assistants
We would be happy to know what you do with it! You can take a look at what some people have built with it already https://github.com/snipsco/awesome-snips
We are open-sourcing it over time, starting with the NLU: https://medium.com/snips-ai/snips-nlu-is-an-open-source-priv.... Snips is available in English, French, German, and soon Japanese and Korean with more European languages coming this year.
You can start building your own private-by-design smart speaker on the platform in under 1h with this tutorial: https://medium.com/snips-ai/building-a-voice-controlled-home...