LFM2.5-8B-A1B: An Better On-Device Mixture-of-Experts

LFM2.5-8B-A1B: an Even Better on-Device Mixture-of-Experts | Liquid AI

Solutions

Resources

Company

Get Liquid

Request a Demo Try Liquid

Products

Solutions

Research Resources

company

Models

LFM2.5-8B-A1B: an Even Better on-Device Mixture-of-Experts

Authors Liquid AI

Published May 28, 2026

Today, we're releasing LFM2.5-8B-A1B , an edge model built for fast, reliable tool calling on consumer hardware. It builds on our LFM2-8B-A1B release from October 2025, with an expanded 128K context window, scaled-up pretraining (from 12T to 38T tokens), and large-scale reinforcement learning. We also doubled its vocabulary to improve tokenization efficiency for non-Latin languages. The result is a model that chains tool calls, achieves tasks, and fits comfortably even on an entry-level laptop. The base (LFM2.5-8B-A1B-Base) and post-trained (LFM2.5-8B-A1B) models are available today on Hugging Face, LEAP, and our Playground. Check out our docs on how to run and fine-tune them locally.

*AA-Omniscience Index (higher is better) rewards correct answers and penalizes hallucinations. Scores range from -100 to 100. See more results on Artificial Analysis.Highlights On-device personal assistant. Designed to power real-life applications, chaining tool calls, and following complex instructions on all devices. Compressed performance. Competitive with much larger dense and MoE models on instruction following and agentic tasks. Unmatched throughput. Fastest in its size class on both CPU and GPU inference, with day-one support for llama.cpp, MLX, vLLM, and SGLang. What changed since LFM2-8B-A1B Compared to LFM2-8B-A1B, this new version expands the context window from 32,768 to 128,000 tokens . This allows the model to process longer documents and reason for longer. Its vocabulary size was also scaled up from 65,536 to 128,000 to tokenize non-Latin scripts more efficiently . We see particularly strong compression gains in Hindi, Thai, Vietnamese, Indonesian, and Arabic. The rest of the architecture follows the same combination of MoE, GQA, and gated short convolution blocks as LFM2-8B-A1B, as shown in the following figure.

Unlike its predecessor, LFM2.5-8B-A1B is a reasoning-only model , producing an explicit chain of thought before its final answer. We adopted this strategy because MoE models generally run in compute-bound settings, where a smaller number of active parameters makes each reasoning token cheap. This provides a significant quality boost without compromising speed. Thanks to reasoning and scaled-up training, this new version performs significantly better:

Benchmark

LFM2-8B-A1B

LFM2.5-8B-A1B

AA-Omniscience Index

-78.42

-24.70

+53.62

AA-Omniscience Accuracy

7.33

8.67

+1.34

AA-Omniscience Non-Hallucination Rate

7.46

63.47

+56.01

IFEval

79.44

91.84

+12.40

IFBench

26.00

56.47

+30.47

Multi-IF

58.54

79.93

+21.39

MATH500

74.80

88.76

+13.96

AIME25

20.00

42.53

+22.53

BFCLv3

45.07

64.36

+19.29

BFCLv4

25.52

48.50

+22.98

Tau² Telecom

13.60

88.07

+74.47

Tau² Retail

7.02

39.82

+32.80

Training section Tokenizer expansion. LFM2-8B-A1B was originally trained with a 65K BPE tokenizer optimized for our initial language coverage. To better support non-Latin scripts in LFM2.5, we doubled the vocabulary to 128K by extending the existing tokenizer in place rather than retraining the model from scratch.. We continued BPE merge training from the original merges on a multilingual corpus, which keeps most existing token IDs as identity mappings and makes every new token decompose deterministically into a sequence of original sub-tokens. We initialize the new embedding rows as the mean of their sub-token decompositions and copy the shared rows unchanged. We then recover quality through a brief two-stage adaptation: embedding-only training, followed by full-model continued pretraining. The table below reports chars/token, roughly how much text each token carries: higher is better, and the new tokenizer is more efficient in all 16 languages

Tokenizer

Arabic (ar)

German (de)

English (en)

Spanish (es)

French (fr)

Hindi (hi)

Indonesian (id)

Italian (it)

Japanese (ja)

Korean (ko)

Polish (pl)

Portuguese (pt)

Russian (ru)

Thai (th)

Vietnamese (vi)

Chinese (zh)

Old tokenizer

2.239

3.641

4.063

3.442

3.618

0.961

2.731

3.251

1.836

1.652

2.672

3.194

2.703

0.671

1.519

1.475

New tokenizer

3.107

3.783

4.137

3.579

3.759

2.118

3.513

3.475

1.963

1.943

2.895

3.450

2.876

2.269

3.311

1.620

Improvement

+38.8%

+3.9%

+1.8%

+4.0%

+3.9%

+120.4%

+28.6%

+6.9%

+17.6%

+8.3%

+8.0%

+6.4%

+238.2%

+117.9%

+9.8%

Context extension. We first extended the context window to 32K through a 2T token midtraining phase focused on reasoning, math, tool-use, and longer documents. We then extended the context to 128K by increasing the RoPE base θ and running an additional 400B token midtraining stage focused on long-document and long-trajectory data. Doom...

LFM2.5-8B-A1B: An Better On-Device Mixture-of-Experts

Related Articles

Amazon, Facebook, FBI have access to a private intelligence-sharing network

Show HN: GoPeek – open links in live mini browser windows without new tabs

Agent Memory: An Anatomy

SpaceX not the behemoth everyone thought

The Mirror Is Part of the Machine