Liquid AI releases a 230M model optimized for phones, Raspberry Pi, and robots

mpfect1 pts0 comments

LFM2.5-230M: Built to Run Anywhere — Blog — Liquid AI

Connect

h2]:clear-both [&>h3]:clear-both">Today, we're releasing LFM2.5-230M , our smallest model yet. It’s a fast, lightweight foundation for developers to fine-tune and deploy in agentic workflows. Built on the LFM2 architecture, it delivers exceptionally fast inference and runs everywhere, from cloud GPUs to low-cost CPUs (213 tok/s decode speed on Galaxy S25 Ultra, 42 tok/s on a Raspberry Pi 5). Despite its small size, it’s surprisingly capable at tool use and data extraction tasks.<br>The base (LFM2.5-230M-Base) and post-trained (LFM2.5-230M) models are available today on Hugging Face. Check out our docs on how to run and fine-tune them locally.<br>Training & Fine-tuning<br>The model was pre-trained for 19T tokens, including a 32K context extension phase. We apply a lightweight post-training recipe designed to preserve flexibility for developers targeting their own downstream applications.<br>The recipe consists of three stages: (1) supervised fine-tuning with distillation from LFM2.5-350M, (2) direct preference optimization, and (3) multi-domain reinforcement learning . The final checkpoint balances strong out-of-the-box capabilities with adaptability to downstream specialization, while remaining competitive with larger models.<br>As an early look at ongoing work, we deployed LFM2.5-230M on a Unitree G1 humanoid robot, running entirely on-device on its onboard NVIDIA Jetson Orin. Here the model acts as a skill-selection layer: it takes a single natural-language instruction and decomposes it into a sequence of tool calls that invoke pre-trained low-level skills provided by NVIDIA's SONIC framework. After a quick fine-tune for this task, the model turns a free-form command such as<br>"Hold still for 2 seconds, then walk forward at 1 meter per second for 3 meters, hold a forward one-leg kneel for 5 seconds, and walk backward at 0.5 meters per second for 3 meters"<br>into a structured, multi-step plan, chaining skills like timed walking at a target velocity and a one-legged kneel. While the behaviors are deliberately simple at this stage, we think it's a compelling signal: a 230M-parameter model can be quickly fine-tuned and deployed on-device to serve as the natural-language control interface for a humanoid.

Benchmarks<br>We evaluated LFM2.5-230M across ten benchmarks covering both core capabilities and applied tasks. Despite its size, it competes with and often beats models more than twice as large , spanning knowledge (GPQA Diamond, MMLU-Pro), instruction following (IFEval, IFBench, Multi-IF), data extraction (CaseReportBench), and tool use (BFCLv3, BFCLv4, τ²-Bench Telecom and Retail).

GPQA Diamond<br>MMLU-Pro<br>IFEval<br>IFBench<br>Multi-IF<br>LFM2.5-230M<br>25.41<br>20.25<br>71.71<br>38.40<br>37.70<br>LFM2.5-350M<br>30.64<br>20.01<br>76.96<br>40.69<br>44.92<br>LFM2-350M<br>27.58<br>19.29<br>64.96<br>18.20<br>32.92<br>Granite 4.0-H-350M<br>22.32<br>13.14<br>61.27<br>17.22<br>28.70<br>Granite 4.0-350M<br>25.91<br>12.84<br>53.48<br>15.98<br>24.21<br>Qwen3.5-0.8B (Instruct)<br>27.41<br>37.42<br>59.94<br>22.87<br>41.68<br>Gemma 3 1B IT<br>23.89<br>14.04<br>63.49<br>20.33<br>44.25

CaseReportBench<br>BFCLv3<br>BFCLv4<br>𝜏²-Bench Telecom<br>𝜏²-Bench Retail<br>LFM2.5-230M<br>22.51<br>43.26<br>21.03<br>5.26<br>13.68<br>LFM2.5-350M<br>32.45<br>44.11<br>21.86<br>18.86<br>17.84<br>LFM2-350M<br>11.67<br>22.95<br>12.29<br>10.82<br>5.56<br>Granite 4.0-H-350M<br>12.44<br>43.07<br>13.28<br>13.74<br>6.14<br>Granite 4.0-350M<br>0.84<br>39.58<br>13.73<br>2.92<br>6.14<br>Qwen3.5-0.8B (instruct)<br>13.83<br>35.08<br>18.70<br>12.57<br>6.14<br>Gemma 3 1B IT<br>2.28<br>16.61<br>7.17<br>9.36<br>6.43

This makes LFM2.5-230M an ideal solution to power large-scale data extraction pipelines or lightweight on-device agentic workloads. However, given its compact size, we do not recommend it for reasoning-heavy workloads such as advanced math, code generation, or creative writing.<br>Fast Inference Everywhere<br>LFM2.5-230M ships with day-one support across the inference ecosystem:<br>llama.cpp — GGUF checkpoints for efficient edge inference<br>MLX — Optimized inference for Apple Silicon<br>vLLM — GPU-accelerated serving for production throughput<br>SGLang — GPU-accelerated serving for production throughput<br>ONNX — Cross-platform inference across diverse accelerators<br>CPU inference. Thanks to the efficient LFM2 architecture, LFM2.5-230M is considerably faster than similar-sized models, including SSM hybrids and Gated Delta Networks. On both a Raspberry Pi 5 and a Qualcomm Snapdragon Gen4 (Samsung Galaxy S25 Ultra), it delivers the highest prefill and decode throughput in its class while keeping the smallest memory footprint. We tune the flash-attention flag per device to maximize prefill on each platform: enabled (-fa 1) on the Raspberry Pi 5 and disabled (-fa 0) on the Snapdragon Gen4.<br>GPU inference. For production-grade enterprise deployments, we have also developed an internal GPU inference stack that delivers extremely low-latency serving. We benchmark it against other small models running on SGLang, and across all concurrency levels, LFM2.5 models achieve considerably lower end-to-end latency.<br>Get Started<br>Start building today with LFM2.5-230M and LFM2.5-230M-Base, available on...

lfm2 230m inference 350m model fine

Related Articles