Release v2.0.0 — The Own Everything Release · Zyora-Dev/zse · GitHub
//releases/show" data-turbo-transient="true" />
Skip to content
Search or jump to...
Search code, repositories, users, issues, pull requests...
-->
Search
Clear
Search syntax tips
Provide feedback
--><br>We read every piece of feedback, and take your input very seriously.
Include my email address so I can be contacted
Cancel
Submit feedback
Saved searches
Use saved searches to filter your results more quickly
-->
Name
Query
To see all available qualifiers, see our documentation.
Cancel
Create saved search
Sign in
//releases/show;ref_cta:Sign up;ref_loc:header logged out"}"<br>Sign up
Appearance settings
Resetting focus
You signed in with another tab or window. Reload to refresh your session.<br>You signed out in another tab or window. Reload to refresh your session.<br>You switched accounts on another tab or window. Reload to refresh your session.
Dismiss alert
{{ message }}
Zyora-Dev
zse
Public
Notifications<br>You must be signed in to change notification settings
Fork
Star<br>151
v2.0.0 — The Own Everything Release
Latest
Latest
Compare
Choose a tag to compare
Sorry, something went wrong.
Filter
Loading
Sorry, something went wrong.
Uh oh!
There was an error while loading. Please reload this page.
No results found
View all tags
zyoraclub
released this
22 May 11:47
·
4 commits
to main<br>since this release
v2.0.0
55a9742
ZSE v2.0.0 — The "Own Everything" Release
A complete rewrite. Zero third-party dependencies. No PyTorch, no Triton, no transformers, no bitsandbytes. Pure-Python kernel compiler emits CUDA C, HIP C, and Metal Shading Language directly.
Install size: ~3 GB → ~5 MB .
Headline numbers (Qwen2.5-14B INT4 vs vLLM AWQ INT4 on A100-80GB)
Metric<br>ZSE<br>vLLM
Cold start<br>6.29s<br>127.02s<br>20.2×
VRAM used<br>12.28 GB<br>71.45 GB<br>5.82× less
Single-seq tok/s<br>37.0<br>26.5<br>1.40×
Validated on 6 platforms
GPU<br>Cold start<br>vs vLLM AWQ INT4 cold
NVIDIA T4 (sm_75)<br>7.25s<br>30.2× faster
NVIDIA L4 (sm_89)<br>5.58s<br>26.0× faster
NVIDIA A10G (sm_86)<br>6.01s<br>32.1× faster
NVIDIA A100-80GB<br>6.29s<br>20.2× faster
AMD MI300X<br>3.14s<br>13.6× faster (vs vLLM-ROCm FP16)
Apple M1<br>E2E vector_add validated, full inference pending
Install
pip install zse-engine<br>zse serve model.zse --port 8000
Or run the kernel compiler standalone:
pip install zse-compiler
What's in this release
ZSE Kernel Compiler — @zse.kernel Python DSL → CUDA / HIP / Metal. Warp primitives, vectorized memory, block reductions, tiling, fusion, WMMA, CDNA3 MFMA matrix cores, auto-tuning.
.zse model format v2 — pre-quantized INT4/INT8/FP16, mmap-friendly, C-accelerated quantization (~600× faster). Adapters for Llama / Mistral / Qwen2 / Gemma2 / Phi3.
Own PagedAttention — adaptive block sizing, token-level eviction, FNV-1a dedup, COW forking.
ZStreamer — continuous batching, disaggregated prefill/decode, chunked prefill, speculative decoding (n-gram + self-draft).
Orchestrator — unified VRAM allocator, 29 GPU kernels on MI300X, CUDA Graphs + HIP Graphs, LoRA hot-swap.
Server — OpenAI-compatible API, API key auth, rate limiting, SQLite store, built-in RAG (/v1/rag/*), web dashboard.
RAG — BM25 + TF-IDF + dense embeddings (via the loaded LLM, zero extra deps) + Reciprocal Rank Fusion + LLM cross-encoder rerank.
Tensor Parallelism — pure-ctypes NCCL/RCCL wrapper, multi-process workers.
Breaking changes
Package rename: zllm-zse → zse-engine on PyPI
Module rename: zse → zse_engine
.zse format v2 is incompatible with 1.x — re-convert with zse convert
bnb / bitsandbytes backend removed
PyTorch / Triton / transformers dependencies removed
Full migration guide and detailed change log: CHANGELOG.md
Acknowledgments
AMD MI300X validation, 32B-parameter benchmarks, and our ROCm wave-64 kernel development were made possible by DigitalOcean's Open Source Sponsorship Program .
447 tests passing. Zero dependencies. Three GPU backends. One package.
Assets
Loading
Uh oh!
There was an error while loading. Please reload this page.
-->
All reactions
You can’t perform that action at this time.