Llama.cpp b9180: MTP support landed

Release b9180 · ggml-org/llama.cpp · GitHub

//releases/show" data-turbo-transient="true" />

Search or jump to...

Search code, repositories, users, issues, pull requests...

-->

Clear

Search syntax tips

Provide feedback

--> We read every piece of feedback, and take your input very seriously.

Include my email address so I can be contacted

Cancel

Submit feedback

Saved searches

Use saved searches to filter your results more quickly

-->

Name

Query

To see all available qualifiers, see our documentation.

Cancel

Create saved search

//releases/show;ref_cta:Sign up;ref_loc:header logged out"}" Sign up

Appearance settings

Resetting focus

You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session.

Dismiss alert

ggml-org

llama.cpp

Public

Notifications You must be signed in to change notification settings

Fork 18.3k

Star 111k

b9180

Compare

Choose a tag to compare

Sorry, something went wrong.

Filter

Sorry, something went wrong.

Uh oh!

There was an error while loading. Please reload this page.

No results found

View all tags

github-actions

released this

16 May 16:48

b9180

2555826

This commit was created on GitHub.com and signed with GitHub’s verified signature .

GPG key ID: B5690EEEBB952194

Verified

Learn about vigilant mode.

llama + spec: MTP Support (#22673)

spec: support MTP

fix batch size

rename files

cont : simplify (#7)

MTP: clean-up (#9)

MTP: clean-up

review: use llama_context_type instead of llama_graph_type

review: remove llama_model_has_mtp

review: fix convert issues

convert: fix pycheck

review: formatting

use mtp- for identifying mtp models

convert: fix mtp conversion

mtp -> draft-mtp

remove unused llama_arch

add need_embd in speculative

llama: allow partial seq_rm for GDN models for speculative decoding

Currently speculative checkpoint needs to restart from a checkpoint

after some draft tokens are not accepted, this leads to some wastage in

running the target again. This PR adds the ability to rollback upto

draft_max by storing the GDN intermediates.

fix pending state

vulkan: add GDN partial rollback

meta: extend check to axis 1

metal: add GDN partial rollback

Extend the gated delta net kernel to store intermediate states for

partial rollback support on the Metal backend.

Add K (snapshot slot count) as a function constant

Read input state from slot 0 of the 3D state tensor

Write intermediate states to different slots during token loop

For K=1, maintain backward-compatible single-slot behavior

Ref: 8c05923

Assisted-by: llama.cpp:local pi

delta_net_base: use ggml_pad instead of new_tensor

review: add need_rs_seq

review: rename part_bounded to n_rs

review: deslop comments

review: rename, add asserts

server : adjust checkpoint logic (#11)

server : adjust checkpoint logic

cont : rm asserts

server-context: fix early exit

spec : fix compatibility with n-gram and add TODOs (#13)

metal : cleanup

llama : fix faulty bitwise check in recurrent memory

server : disable RS-based MTP in combination with other spec types

spec : add TODOs

cont : fix comment

cont : update comment

common : fix logic for ngram + mtp compat

llama-memory: enable checkpointing with partial rollback

cont: add test-case for loading into a dirty ctx

llama-memory-recurrent: clear rs_idx in clear

download: fix mtp path

llama-arch: fix enorm op

docs: update docs

conversion: fix type annotations

Co-authored-by: Georgi Gerganov ggerganov@gmail.com

macOS/iOS:

macOS Apple Silicon (arm64)

macOS Apple Silicon (arm64, KleidiAI enabled)

macOS Intel (x64)

iOS XCFramework

Linux:

Ubuntu x64 (CPU)

Ubuntu arm64 (CPU)

Ubuntu s390x (CPU)

Ubuntu x64 (Vulkan)

Ubuntu arm64 (Vulkan)

Ubuntu x64 (ROCm 7.2)

Ubuntu x64 (OpenVINO)

Ubuntu x64 (SYCL FP32)

Ubuntu x64 (SYCL FP16)

Android:

Android arm64 (CPU)

Windows:

Windows x64 (CPU)

Windows arm64 (CPU)

Windows x64 (CUDA 12) - CUDA 12.4 DLLs

Windows x64 (CUDA 13) - CUDA 13.1 DLLs

Windows x64 (Vulkan)

Windows x64 (SYCL)

Windows x64 (HIP)

openEuler:

openEuler x86 (310p)

openEuler x86 (910b, ACL Graph)

openEuler aarch64 (310p)

openEuler aarch64 (910b, ACL Graph)

Assets 30

Uh oh!

There was an error while loading. Please reload this page.

-->

👍 22 bricss, klimekop6, tsterbak, aoprea1982, kripper, zenoda, tclm, shuicc8493, FernandoZ879, huuquockiet, and 12 more reacted with thumbs up emoji 😄 mirek190, NASA60, tclm, huuquockiet, and 0c33 reacted with laugh emoji 🎉 50 adcape, jmagder, keirongulrajani, glorious73, Nevens-fr, ChosenOne2241, thomas-0816, gordoncheong, rbestuar, 1zilc, and 40 more reacted with hooray emoji ❤️ 20 rbestuar, 1zilc, bricss, tripletto, NASA60, k-atalay, dannyaitran, chr0n1x, smth4vl, yanochka-profi, and 10 more reacted with heart...

Llama.cpp b9180: MTP support landed

Related Articles

Elevated error rates on requests to multiple models

Donald Trump and sons to be 'forever' exempt from tax audits

PopuLoRA: Co-Evolving LLM Populations for Reasoning Self- Play

Old Reddit Is Down

The ultimate female fantasy – A feminist critique of Beauty and the Beast