Release b9180 · ggml-org/llama.cpp · GitHub
//releases/show" data-turbo-transient="true" />
Skip to content
Search or jump to...
Search code, repositories, users, issues, pull requests...
-->
Search
Clear
Search syntax tips
Provide feedback
--><br>We read every piece of feedback, and take your input very seriously.
Include my email address so I can be contacted
Cancel
Submit feedback
Saved searches
Use saved searches to filter your results more quickly
-->
Name
Query
To see all available qualifiers, see our documentation.
Cancel
Create saved search
Sign in
//releases/show;ref_cta:Sign up;ref_loc:header logged out"}"<br>Sign up
Appearance settings
Resetting focus
You signed in with another tab or window. Reload to refresh your session.<br>You signed out in another tab or window. Reload to refresh your session.<br>You switched accounts on another tab or window. Reload to refresh your session.
Dismiss alert
{{ message }}
ggml-org
llama.cpp
Public
Notifications<br>You must be signed in to change notification settings
Fork<br>18.3k
Star<br>111k
b9180
Compare
Choose a tag to compare
Sorry, something went wrong.
Filter
Loading
Sorry, something went wrong.
Uh oh!
There was an error while loading. Please reload this page.
No results found
View all tags
github-actions
released this
16 May 16:48
b9180
2555826
This commit was created on GitHub.com and signed with GitHub’s verified signature .
GPG key ID: B5690EEEBB952194
Verified
Learn about vigilant mode.
llama + spec: MTP Support (#22673)
spec: support MTP
fix batch size
rename files
cont : simplify (#7)
MTP: clean-up (#9)
MTP: clean-up
review: use llama_context_type instead of llama_graph_type
review: remove llama_model_has_mtp
review: fix convert issues
convert: fix pycheck
review: formatting
use mtp- for identifying mtp models
convert: fix mtp conversion
mtp -> draft-mtp
remove unused llama_arch
add need_embd in speculative
llama: allow partial seq_rm for GDN models for speculative decoding
Currently speculative checkpoint needs to restart from a checkpoint
after some draft tokens are not accepted, this leads to some wastage in
running the target again. This PR adds the ability to rollback upto
draft_max by storing the GDN intermediates.
fix pending state
vulkan: add GDN partial rollback
meta: extend check to axis 1
metal: add GDN partial rollback
Extend the gated delta net kernel to store intermediate states for
partial rollback support on the Metal backend.
Add K (snapshot slot count) as a function constant
Read input state from slot 0 of the 3D state tensor
Write intermediate states to different slots during token loop
For K=1, maintain backward-compatible single-slot behavior
Ref: 8c05923
Assisted-by: llama.cpp:local pi
delta_net_base: use ggml_pad instead of new_tensor
review: add need_rs_seq
review: rename part_bounded to n_rs
review: deslop comments
review: rename, add asserts
server : adjust checkpoint logic (#11)
server : adjust checkpoint logic
cont : rm asserts
server-context: fix early exit
spec : fix compatibility with n-gram and add TODOs (#13)
metal : cleanup
llama : fix faulty bitwise check in recurrent memory
server : disable RS-based MTP in combination with other spec types
spec : add TODOs
cont : fix comment
cont : update comment
common : fix logic for ngram + mtp compat
llama-memory: enable checkpointing with partial rollback
cont: add test-case for loading into a dirty ctx
llama-memory-recurrent: clear rs_idx in clear
download: fix mtp path
llama-arch: fix enorm op
docs: update docs
conversion: fix type annotations
Co-authored-by: Georgi Gerganov ggerganov@gmail.com
macOS/iOS:
macOS Apple Silicon (arm64)
macOS Apple Silicon (arm64, KleidiAI enabled)
macOS Intel (x64)
iOS XCFramework
Linux:
Ubuntu x64 (CPU)
Ubuntu arm64 (CPU)
Ubuntu s390x (CPU)
Ubuntu x64 (Vulkan)
Ubuntu arm64 (Vulkan)
Ubuntu x64 (ROCm 7.2)
Ubuntu x64 (OpenVINO)
Ubuntu x64 (SYCL FP32)
Ubuntu x64 (SYCL FP16)
Android:
Android arm64 (CPU)
Windows:
Windows x64 (CPU)
Windows arm64 (CPU)
Windows x64 (CUDA 12) - CUDA 12.4 DLLs
Windows x64 (CUDA 13) - CUDA 13.1 DLLs
Windows x64 (Vulkan)
Windows x64 (SYCL)
Windows x64 (HIP)
openEuler:
openEuler x86 (310p)
openEuler x86 (910b, ACL Graph)
openEuler aarch64 (310p)
openEuler aarch64 (910b, ACL Graph)
Assets<br>30
Loading
Uh oh!
There was an error while loading. Please reload this page.
-->
👍<br>22<br>bricss, klimekop6, tsterbak, aoprea1982, kripper, zenoda, tclm, shuicc8493, FernandoZ879, huuquockiet, and 12 more reacted with thumbs up emoji<br>😄<br>mirek190, NASA60, tclm, huuquockiet, and 0c33 reacted with laugh emoji<br>🎉<br>50<br>adcape, jmagder, keirongulrajani, glorious73, Nevens-fr, ChosenOne2241, thomas-0816, gordoncheong, rbestuar, 1zilc, and 40 more reacted with hooray emoji<br>❤️<br>20<br>rbestuar, 1zilc, bricss, tripletto, NASA60, k-atalay, dannyaitran, chr0n1x, smth4vl, yanochka-profi, and 10 more reacted with heart...