grpyc — Up to 8x Faster gRPC for Python | Rust Safety, Drop-in Compatible⚡"><br>Drop-in API-compatible gRPC for Python<br>Up to 8x faster gRPC.<br>Rust safety.<br>Still Python.<br>grpyc is a drop-in replacement for grpcio 1.80, built in Rust. Up to 8x throughput on GKE, 2x lower latency, zero memory leaks. Change one import.<br>Talk to Engineering View Benchmarks
Up to 8x<br>Faster than grpcio
2.2x<br>Lower latency
3.4x<br>QPS per core
1 line<br>To migrate
Accelerate AI/ML Inference<br>vLLM, Triton, and TensorFlow Serving rely on gRPC between clients and inference workers. Python's grpcio adds latency that wastes expensive GPU time. grpyc reduces it.
814 tok/s<br>vLLM streaming throughput<br>grpyc delivers tokens faster than REST or grpcio in simulated vLLM inference with 50-token responses.
2.2x<br>Lower P50 latency<br>Lower transport latency means requests reach the GPU faster, improving batch fill rates and overall GPU utilization.
Better batching<br>Higher GPU efficiency<br>Faster request delivery fills GPU batches more efficiently. Less time waiting on Python means more tokens per second per dollar.
See AI/ML Details →
Pure Rust. No C. No compromises.<br>grpcio wraps the C core through Cython — inheriting memory leaks, GIL contention, and compilation hell. grpyc is a native Rust implementation: memory safe by default, zero C code anywhere in the stack.
Standard gRPC Python<br>Slower<br>Python API<br>Cython Wrappers<br>C Core Shim<br>gRPC C Core<br>OS / Network
grpyc 100% Rust<br>Up to 8x<br>Python API (compatible surface)<br>PyO3 Rust ↔ Python<br>h2 + rustls HTTP/2 + TLS, pure Rust<br>Tokio async I/O<br>OS / Network
Memory safe by default — Rust ownership eliminates entire vulnerability classes<br>Minimal GIL contention — I/O and serialization in Rust, GIL released during network ops<br>No C compilation — pure Python wheel, installs in seconds<br>No memory leaks — impossible by design, not just tested
Rust-powered. Safe by default.<br>A complete gRPC stack — 47 Rust modules, zero C code. Every feature enterprises need, with memory safety guarantees that C/C++ cannot provide.
Tokio Async Runtime<br>All I/O runs on the Tokio runtime, completely outside Python's GIL. True async without contention — the #1 source of tail latency in grpcio is gone.
Memory Safe by Design<br>Rust's ownership model eliminates use-after-free, buffer overflows, and data races at compile time. No more memory leaks under load. Security audited.
xDS Service Mesh<br>Full xDS support — LDS, RDS, CDS, EDS — for proxyless gRPC. Connect directly to your control plane. No sidecar proxy overhead.
TLS / mTLS via rustls<br>Modern TLS without OpenSSL. No compilation headaches, no dependency conflicts. Mutual TLS for zero-trust architectures.
Intelligent Load Balancing<br>Round-robin, ring-hash, weighted round-robin, outlier detection — all built in. ORCA load reporting for advanced traffic management.
All 4 Streaming Modes<br>Unary, server streaming, client streaming, bidirectional — all fully async through Tokio. Flow control and backpressure handled in Rust.
Drop-in Compatible<br>Same grpc Python API. Change one import line. Your existing protobuf definitions, handlers, and interceptors work unchanged.
Zero C Compilation<br>Pure Python wheel. pip install grpyc takes seconds, not minutes. No build tools, no C compiler needed.
Unlock performance on your workload<br>Whether you're serving models, running on Google Cloud, or scaling your service mesh — grpyc removes the bottleneck.
AI & ML Inference<br>Ship faster inference — Model serving frameworks (vLLM, Triton, TensorFlow Serving) rely on gRPC between clients and inference workers. grpyc's Tokio runtime eliminates the gRPC overhead that wastes expensive GPU time.<br>Tighter latency → fewer retries → more effective GPU utilization<br>Lower memory per connection → more room for model weights<br>Streaming support for token-by-token generation
Google Cloud<br>Accelerate every GCP API call — BigQuery, Pub/Sub, Spanner, Firestore — the Google Cloud Python SDK uses gRPC for every call. grpyc replaces that transport layer with Rust-powered performance, zero code changes.<br>Higher throughput for BigQuery Storage API reads<br>Faster Pub/Sub publish and subscribe<br>Predictable Spanner latency for real-time apps
Service Mesh at Scale<br>Scale your service mesh without limits — In a mesh with hundreds of Python gRPC services, one slow service cascades into timeouts across the network. grpyc's predictable latency breaks the cascade chain.<br>Proxyless gRPC via built-in xDS — eliminate sidecar overhead<br>Tight P50–P99 spread keeps timeouts meaningful<br>Built-in load balancing, health checking, outlier detection
The numbers that prove it<br>GKE benchmarks with a Go client. Same protobuf definitions, same handler logic. grpyc vs grpcio vs REST.
Async Unary QPS (Java server) GKE c2-standard-8 — higher is better<br>grpyc 60,263
grpcio 14,595
Unary Latency (P50) GKE ping-pong — lower is better<br>grpyc 208μs
grpcio 451μs
Cross-node QPS (c=64) GKE e2-standard-4 — higher is better<br>grpyc 10,145
FastAPI 5,261
grpcio 3,279
vLLM...