Like the direction you are headed. Considering that use of ASICs is going to rise, think you should consider local installs through docker(like machinebox.io) or another technique.
Also federated learning would be next thing to take on.
We do provide docker option as well. Federated learning looks like a good way of edge computing + deep learning to offer personalized models as well as use them to improve general model.
Quoth WP: "Modern ASICs often include entire microprocessors, memory blocks including ROM, RAM, EEPROM, flash memory and other large building blocks. Such an ASIC is often termed a SoC (system-on-chip). "
The ARM core on the SoC is part of the asic. As is the VideoCore part.
I think what OP meant was ASIC specific to deep learning like TPU's. However as I see, current frameworks are not matured enough to support GPU's and TPU's with exact same code. Also there are no standards so every big org is going to build support for their own ASIC interfaces for the framework they manage. Is there an open source interface for ASIC's for deep learning?
Cambricon is going big in china so its not just google and apples. They claim to be 6 times faster than GPU.
I am more interested in potential of being able to run video processing, voice models effortlessly on tiny devices. and also to train models offline or locally.
I think there is a good scope of solutions (like vision recognition) that port well across AI chips.