close
close

picoLLM is a cross-platform on-device LLM inference engine

Large-language models (LLM) can run locally on minicomputers or single-board computers such as Raspberry Pi 5, but with limited performance due to high memory consumption and bandwidth requirements. That’s why Picovoice has developed the cross-platform picoLLM Inference Engine SDK optimized for running compressed models in large languages ​​on systems running Linux (x86_64), macOS (arm64, x86_64) and Windows (x86_64), Raspberry Pi OS on Pi 5 and 4, Mobile operating systems Android and iOS, as well as web browsers such as Chrome, Safari, Edge and Firefox.

picoLLM Raspberry Pi 5

Alireza Kenarsari, CEO of Picovoice, told CNX Software that “picoLLM is a collaborative effort between Picovoice’s deep learning researchers who developed the X-bit quantization algorithm and the engineers who built the cross-platform LLM inference engine to bring any LLM to any device and regain control to enterprises.

The company claims that picoLLM provides better accuracy than GPTQ when using Llama-3.8B MMLU (Massive Multitask Language Understanding) as the metric, as shown in the diagram below, with the greatest gain occurring when using 2-bit settings. A 4-bit INT result has the same MMLU result as a 16-bit floating-point result.

Accuracy of picoLLM vs GPTQ
MMLU comparison between picoLLM and GPTQ for Lamy-3-8b

On GitHub you will find several demos, an SDK, and demos of various programming languages ​​and platforms. The solution is completely free for models with an open scale, but requires an access key verified by connecting to the server. Once the access key is verified, all LLM processing takes place offline and on-device.

We’ve been writing about Picovoice since 2020 because they offer easy-to-use voice activity detection (VAD) (Cobra), custom wake word (Porcupine), speech-to-text (Leopard and Cheetah), and voice recognition (Rhino) solutions that are quite easy to use and run on low-end hardware like Raspberry Pi and Arduino. The company has now combined several of these software solutions with its picoLLM implementation to create an LLM-based voice assistant written in Python and running on a Raspberry Pi 5, shown in the video below.

YouTube video player

Picovoice solutions are usually free for hobby projects with some limitations, and you should be able to reproduce the demo above on a Raspberry Pi 5 by following the instructions on GitHub.

Support CNX software! Donate via cryptocurrency, become a Patron on Patreon or purchase goods on Amazon or Aliexpress