Infinite Context Paging Engine – Zero-copy LLM context paging in Rust ~419.34 µs

matheusdelgs1 pts0 comments

GitHub - matheusdelgado/infinite-context: An ultra-low latency, zero-copy context virtual memory paging engine written in Rust, designed to break physical VRAM limitations for LLMs and autonomous agents using attention-driven predictive prefetching. · GitHub

/" data-turbo-transient="true" />

Skip to content

Search or jump to...

Search code, repositories, users, issues, pull requests...

-->

Search

Clear

Search syntax tips

Provide feedback

--><br>We read every piece of feedback, and take your input very seriously.

Include my email address so I can be contacted

Cancel

Submit feedback

Saved searches

Use saved searches to filter your results more quickly

-->

Name

Query

To see all available qualifiers, see our documentation.

Cancel

Create saved search

Sign in

/;ref_cta:Sign up;ref_loc:header logged out"}"<br>Sign up

Appearance settings

Resetting focus

You signed in with another tab or window. Reload to refresh your session.<br>You signed out in another tab or window. Reload to refresh your session.<br>You switched accounts on another tab or window. Reload to refresh your session.

Dismiss alert

{{ message }}

matheusdelgado

infinite-context

Public

Notifications<br>You must be signed in to change notification settings

Fork

Star

main

BranchesTags

Go to file

CodeOpen more actions menu

Folders and files<br>NameNameLast commit message<br>Last commit date<br>Latest commit

History<br>2 Commits<br>2 Commits

assets

assets

lib

lib

src

src

.gitignore

.gitignore

Cargo.toml

Cargo.toml

README.md

README.md

build.rs

build.rs

test_engine.py

test_engine.py

View all files

Repository files navigation

ICPE: Infinite Context Paging Engine 🧠

An ultra-low latency, zero-copy context virtual memory paging engine written in Rust, designed to break physical VRAM limitations for LLMs and Long-Lived Autonomous Agents.

🔬 The Core Innovation

ICPE treats LLM token context layers exactly like operating system virtual memory (Paging/Swap). Instead of holding massive, low-activation histories in expensive GPU VRAM, ICPE utilizes an Attention-Driven Predictive Eviction algorithm to page out cold contexts to disk via memory-mapped files (mmap), prefetching hot slices back into high-speed memory nanoseconds before the next inference step.

🛡️ Binary Security, Architecture & Compliance

This project is written 100% in Rust . To protect our core intellectual property and proprietary algorithms during public evaluation, the high-performance predictive engine and thread synchronization heuristics are distributed as pre-compiled, highly optimized Rust binary blobs (.a / .so) located in the /lib directory.

Target Architectures Provided: x86_64-unknown-linux-gnu and aarch64-unknown-linux-gnu (ARM64/Graviton compliant).

Compliance: All binaries are cryptographically signed and built via public, isolated GitHub Actions workflows. SHA-256 hashes are verified at runtime. The core contains 0% external networking, 0% telemetry, and operates strictly within local system memory bounds.

What is Up for Acquisition: Full clean-room source code of the core engine, mathematical specifications, compilation toolchains, and global IP ownership are strictly reserved for total acquisition.

📊 Verifiable Benchmarks (Criterion Release Mode)

ICPE eliminates standard I/O syscall overhead by mapping the execution engine directly into the kernel page cache space using memmap2 and zerocopy.

Prefetch & Eviction Latency: ~419.34 µs (Microseconds) under continuous concurrent thread stress, crossing the FFI boundary safely into the protected core.

Memory Copy Overhead: 0% (True Zero-Copy byte casting).

RAM Footprint: Deterministic, fixed, and completely bounded.

⚖️ Evaluation License

This public repository operates under a strict Open-Core Evaluation License. The architecture, Python wrappers, and benchmarking test-suites are fully open and verifiable. You are free to natively compile, benchmark, and run integration tests locally. Commercial use, production deployment, or cloud infrastructure embedding of the pre-compiled core without an Enterprise License or total IP Acquisition is strictly prohibited.

🚀 How to Run and Verify Performance

You can natively compile the project and audit the benchmarking claims directly on your local infrastructure.

1. Requirements (Linux)

Ensure you have the Rust toolchain, Python 3.12 development headers, and the native linker installed on your machine:

sudo apt update<br>sudo apt install build-essential python3-dev python3-config lld

2. Verify Local Benchmarks (Criterion)

The micro-benchmarking suite is isolated within the core source files to prevent Python runtime context symbol collisions. To run the statistical hardware latency reports, execute:

# Clear any stale linker metadata<br>cargo clean

# Run the target context manager benchmark suite<br>cargo bench --bench context_manager_bench

The detailed statistical distribution curves will be generated under target/criterion/report/index.html.

3. Test the...

context core engine memory paging rust

Related Articles