Josh Millar just released our latest preprint on how to make sense of the growing number of dedicated, ultra-low-power 'neural network accelerators' that are found in many modern embedded chipsets. My interest in this derives from wanting to decouple from the cloud when it comes to low-latency local environments, and this needs fast tensor operations in hardware.