Hi Sean, perhaps you can help us understand the claim that analog hardware SNNs are more energy efficient than analog hardware non-spiking networks (e.g. based on floating gate transistors [1])?
Also, in the paper [2], Table 4 is a bit confusing: what is the "traditional hardware" they're referring to?
Finally, it's not clear from that paper, or from this "Brainstorm" article, that you actually have a working chip, capable of running Alexnet. Is that so? What is the current status of the actual hardware?
I'm not sure it's fair to compare the Brainstorm chip to the paper you linked to, since they take different inputs (binary vs. continuous) and they implement different systems (purely feed-forward vs. recurrent dynamic systems). But basically, the reason I would imagine why rates require more energy than spikes is just basic math. Given that power is the area under the curve, you can picture that intermittent spikes would consume less power than sending a continuous rate signal. There are other advantages to spikes vs. rates, such as robustness to noise/failure.
In table 4, I think they're comparing to GPU hardware, since that's what he was running everything on.
I think the current status of the hardware is prototypes are being produced. The most recent paper [1] used a circuit simulated in SPICE, so I'm guessing they're pretty close to production? I'm not sure, because:
a) I haven't heard anything from my lab-mates in a while.
b) Even if I did hear something from them, I'm not sure I'm allowed to talk about it. Hardware development is way more secretive than software development.
Sure, let's look at the "area under the curve". What would be this area to perform an equivalent of a weighted sum followed by a non-linearity, for the best spiking network? Note that we must ensure the same precision, as when using the analog "continuous rate" signals. If SNN produces less accurate results, then it's really apples to oranges comparison. Don't forget to take into account both static and dynamic power. Also, there's a question of speed: can you build an SNN chip which runs faster given the same power budget, and the same target accuracy?
If they are indeed comparing SNN to a GPU in that paper, then it's just plain horrible! 3 times better efficiency than a 32 bit FLOPs digital GPU (which is probably 2-3 generations old by now)? That's not going to impress anyone. AFAIK, the best digital DL chips are already at least 10 times more efficient than latest GPUs, and analog DL chips claim to have at least another order of magnitude improvement. In my opinion, that's still not enough: if you want to build an analog chip (spiking or not), it better be at least 1000 times more power efficient than the best we can expect from Nvidia in the next couple of years. Otherwise it's just not worth the effort (and inflexibility).
So, the bottom line, until there's a working chip, capable of running Alexnet, we have no guarantees about it's overall energy efficiency, or speed, or accuracy, or noise robustness. Simulating a tiny portion of it in SPICE does not really provide much insight. When the chip is built, and working, then we can compare it to the best analog "continuous rate" chip running the same model, and only then we will be able to see which one is more efficient. Until that time, any claims that spikes are more efficient are unsubstantiated.
On the other hand, if you can devise an algorithm which is uniquely suited for spiking networks (biologically plausible backprop, or whatever), then sure, it's quite possible that you will be able to do things more efficiently. So, my question is, why try mapping DL models, trained on "traditional hardware" to SNNs, which weren't designed to run them? Why not focus instead on finding those biologically plausible algorithms first? If your goal is to understand the brain, wouldn't it be more reasonable to continue experiment in software until you do? Why build hardware to understand brain? That's not a rhetorical question, perhaps there are good reasons, and I'd like to know them.
That's totally fair to want to wait for a comparison until there's actual hardware produced. Especially for comparisons exclusively involving DL models. Make initial "area under the curve" argument was rhetorical and not sufficiently empirically founded.
> If they are indeed comparing SNN to a GPU in that paper, then it's just plain horrible!
Yeah, the paper talks about how this is preliminary and bigger savings are expected for video. However, I must concede as before that without the analog hardware built there isn't much point discussing this.
> So, my question is, why try mapping DL models, trained on "traditional hardware" to SNNs, which weren't designed to run them? Why not focus instead on finding those biologically plausible algorithms first? If your goal is to understand the brain, wouldn't it be more reasonable to continue experiment in software until you do? Why build hardware to understand brain?
Eric Hunsberger [1] has been doing most of the work in this domain, so I'm going to be awkwardly paraphrasing my conversations with him. Eric wanted to make Spaun's [2] vision system better. To do that, he knew he was going to need ConvNets or at least build something off of them. So he started to see if he could bring ConvNets into the domain of SNNs to understand them better. Once he did that, he started looking into if he could learn the SNN ConvNets using biologically plausible back-prop [3] which is where he's at now.
That's really only a branch of our research. To understand the brain, we have a lot of really different methods that are more based in Symbolicism, Bayesianism and Dynamicism. We do start in software [2], but software is slow, even on a GPU. When we get faster hardware, we're able to explore the algos more quickly. Also, we got funding to build/explore analog hardware from the Office of Naval Research, so that's where this project is investigating.
To summarize, the DNN-to-SNN-adaptation algos aren't the only thing targeting this hardware, but a small slice of a family of algos that are defining the requirements of the hardware.
(I hope this post confirms that I understood and accept your argument and isn't me having a case of "must have the last word"-ism)
I appreciate your response, especially the backstory of running a convnet on SNN hardware.
A couple of remarks:
1. Your own response to your stackexchange question: "The random synaptic feedback weights only accomplish back-propagation through one layer of neurons, thus severely limiting the depth of a network."
Didn't they show in the paper how it could work through multiple layers?
2. software is slow, even on a GPU. When we get faster hardware, we're able to explore the algos more quickly
I'm not sure what you're referring to by "faster hardware", but somehow I doubt you will beat a workstation with 8 GPUs in terms of speed, if your goal is to explore algos more quickly. More importantly, what if the next algo you want to explore does not map well to the hardware you built? For example, what if we realize that relative timings between spikes are important, and most of the computation is based on that, but your hardware was not designed to exploit this "race logic" principles? Suddenly your custom system became much less useful, while your GPUs will simulate that just fine.
There's a possibility that instead of exploring the best or most plausible algorithms, you will limit yourself to algorithms which map well to your hardware.
3. What do you think about HTM theory by Numenta? They strive for biologically realistic computation, but they don't think spikes are important, and abstract them away in their code.
p.s. the reason I'm involved in this discussion is I'm trying to decide whether to accept an internship offer to work on Bayesian inference algorithms for SNN chip.
(When does this thread get closed by Hacker News as being too old? If/when it does is there somewhere public you want to continue the discussion? If you'd like we can move it to the Nengo forums at forum.nengo.ai)
1. Dammit. I misread that paper. You're totally right that they do show it works for multiple layers.
2. By faster hardware, I mean neuromorphic hardware such as BrainScaleS and Spinnaker. The software we use, Nengo, is pretty dependent on the speed of a single GPU, since it's really hard to separate the networks across multiple GPUs. You're right that there's always the possibility that our newer algorithms don't map well onto the specialized hardware we've built. The reason why we think we're ready to at least implement a few hardware prototypes is:
- The principles that underlie our algorithms, the Neural Engineering Framework, have been around for 15 years and are pretty mature. The software built to support these principles, Nengo, has been through six re-writes and if finally pretty stable.
- Some of the hardware implementations are general enough that they can handle pretty drastic changes in algorithms. For example, Spinnaker is just a hexagonal grid of stripped-down ARM chips.
- Even if the hardware ends up limiting what algorithms we can implement, we can probably re-use a lot of the design to implement whatever the new algorithms require.
3. I've been meaning to investigate the HTM theory of Numenta for years (they were actually the first people to get me excited about brain-motivated machine intelligence), but never got around to it. I'm also super unclear on the relation between HTM, the Neural Engineering Framework and the newer theory FORCE. I'll write a question on cogsci.stackexchange.com to motivate myself to dig in.
Where is the internship happening? Will you be working with [Sughanda Sharma][1]? She's my lab-mate working on that using the Neural Engineering Framework.
Sure, let's move to Nengo forums. Do you mind creating a topic there and sending me a link?
I actually don't know much about spiking NNs (software or hardware), but Spaun seems like the only real competitor to HTM. Unlike Spaun, Numenta's algorithms are not currently limited by computational resources, because they focus on a rather small part of neocortex (2-3 layers of a single region, working with tiny input sizes), and they abstract spikes. Numenta claims that if we understand what spikes are doing (computation and information transfer), then there's not need to emulate them exactly, we can construct algorithms which do the same thing using traditional computations and data structures. Instead, HTM wants to understand what the layers do and how they interact.
I believe he has been working on it since, and plans to publish a book.
The internship offer is from HRL, a small company in Malibu. It's quite possible that they are looking at the ideas of your lab-mate, and they might even ask me to implement them. I'm still deciding though.
Also, in the paper [2], Table 4 is a bit confusing: what is the "traditional hardware" they're referring to?
Finally, it's not clear from that paper, or from this "Brainstorm" article, that you actually have a working chip, capable of running Alexnet. Is that so? What is the current status of the actual hardware?
[1] https://arxiv.org/abs/1610.02091 [2] https://arxiv.org/abs/1611.05141