One of the arguments in favour of surveillance capitalism is the great usefulness of cloud-based ML predictions.

After all, who can deny the usefulness of photo apps that automatically recognize faces, detect your speech or help you making sense of the deluge of information in a social feed?

The argument usually goes like this: these features require large neural networks, which in turn require a lot of computational power to train the models, and a lot of memory and disk storage in order to load and save those models.

You can't do such things on small devices that run on batteries. Therefore your phone *HAS* to send your data to servers if you want to get those features. Otherwise, you just won't get those features.

Except that... What if this whole argument is bollocks?

(Private Optimal Energy Training) proves that you can run both the training and the predictions locally, without compromising neither on precision, nor on performance.

After all, the really expensive part of training is back-propagation. POET breaks down the back-propagation performance issue by quantizing the layers (so real-number large tensor multiplications get reduced to smaller multiplications of integer tensors, without sacrificing precision too much), and a clever way of caching the layers that are most likely to be needed, so we don't have to recalculate them, without caching everything though (which would be prohibitive in terms of storage).

The arguments in the paper sound very convincing to me. The code is publicly available on Github. I haven't yet had time to test it myself, but I will quite soon - and try to finally build an alternative voice assistant that can completely run on my phone.


It might be an improvement, those are interesting developments. What I'm afraid of though, are that we get #SurveillanceCapitalism pushing us to the 'worst of both worlds', where #BigTech (or any corporate really) smartly offloads the processing to our devices and yet later still manages to lay their hands on the #PII and/or locally trained models. A big advantage to them where they sorta #decentralized the harvesting and it becomes harder yet to circumvent it when using local #AI.

@humanetech IMHO what really matters is that now at least we have an option. I tried to deploy my speech detection algorithms to Android in the past (with Termux scripts that used Tensorflow in the background), and training even a simple model would easily drain ~15-20% of the battery - and the output model was also about 1 GB in size. My current option for face detection instead relies on a Nextcloud photo plugin, but not everybody has the means or the skills to deploy and maintain their own NC instance.

can do whatever they like, but we now have options to build ML algorithms that run locally and are actually usable and scalable. One of the arguments I've often heard by privacy-aware people who were still using Google Assistant or Photos on their mobile was "but I want/need these features, and no FLOSS alternative can provide them to me". Now we can go out and tell these people "well, actually we can" - and that's quite a big deal :)


Yea. Think my concern is slated to come true regardless. Privacy-aware people have an option in other software they choose. But on their devices they'll likely still can't avoid regular bursts of data going to #BigTech.

Via @rudolf I found this nice project #Googerteller

With local #AI I may get an #Alexa or whatever sending small packages of compressed encrypted plaintext containing my conversations instead of large stream of sound data sent to the cloud.

@blacklight @rudolf

Now, in countries with proper regulation and with reputation of many #BigTech on the line this may not happen.. snooping on conversations like that. But in autoritarian regimes it is a whole different ballgame.

@blacklight also worth mentioning that digikam already has fully offline face recognition that only costs iirc 300mb-ish storage space to keep the model in the local storage

@blacklight OK, that's so awesome I have no words. I mean, I simply can't imagine how could large multiplications be reduced to smaller ones without having some kind of tradeoff or another. I mean, somewhere in there, laws of physics, time or whatever say there has to be a compromise between taking up more memory vs executing in a larger timeframe, since the same computational energy and effort will be expended on an operation, it can't just vanish into the void like that.

@bgtlover previous discrete techniques (e.g. did indeed come with heavy trade-offs between performance and precision, and the paper does indeed mention that their aim is to improve on those techniques.

This paper strikes an optimal point there: it uses quantization to reduce the complexity of the operations, but only up to a point. They named their technique as "quantization, paging and rematerialization": it basically involves cached tables for some results (more or less like the tables of logarithms that me and the previous generations still had in school), and a kind of LRU caching to prevent recalculating results that have already been computed recently.

As I said many times (and as I learned from my experience as a developer at, smart caching is often the best optimization for software, even if it doesn't look as elegant and concise as a mind-blowing new equation.

Sign in to participate in the conversation

A platform about automation, open-source, software development, data science, science and tech.