grpyc: Fast gRPC replacement for Python, built in Rust

tgma1 pts0 comments

grpyc — Up to 8x Faster gRPC for Python | Rust Safety, Drop-in Compatible⚡"><br>Drop-in API-compatible gRPC for Python<br>Up to 8x faster gRPC.<br>Rust safety.<br>Still Python.<br>grpyc is a drop-in replacement for grpcio 1.80, built in Rust. Up to 8x throughput on GKE, 2x lower latency, zero memory leaks. Change one import.<br>Talk to Engineering View Benchmarks

Up to 8x<br>Faster than grpcio

2.2x<br>Lower latency

3.4x<br>QPS per core

1 line<br>To migrate

Accelerate AI/ML Inference<br>vLLM, Triton, and TensorFlow Serving rely on gRPC between clients and inference workers. Python's grpcio adds latency that wastes expensive GPU time. grpyc reduces it.

814 tok/s<br>vLLM streaming throughput<br>grpyc delivers tokens faster than REST or grpcio in simulated vLLM inference with 50-token responses.

2.2x<br>Lower P50 latency<br>Lower transport latency means requests reach the GPU faster, improving batch fill rates and overall GPU utilization.

Better batching<br>Higher GPU efficiency<br>Faster request delivery fills GPU batches more efficiently. Less time waiting on Python means more tokens per second per dollar.

See AI/ML Details &rarr;

Pure Rust. No C. No compromises.<br>grpcio wraps the C core through Cython — inheriting memory leaks, GIL contention, and compilation hell. grpyc is a native Rust implementation: memory safe by default, zero C code anywhere in the stack.

Standard gRPC Python<br>Slower<br>Python API<br>Cython Wrappers<br>C Core Shim<br>gRPC C Core<br>OS / Network

grpyc 100% Rust<br>Up to 8x<br>Python API (compatible surface)<br>PyO3 Rust ↔ Python<br>h2 + rustls HTTP/2 + TLS, pure Rust<br>Tokio async I/O<br>OS / Network

Memory safe by default — Rust ownership eliminates entire vulnerability classes<br>Minimal GIL contention — I/O and serialization in Rust, GIL released during network ops<br>No C compilation — pure Python wheel, installs in seconds<br>No memory leaks — impossible by design, not just tested

Rust-powered. Safe by default.<br>A complete gRPC stack — 47 Rust modules, zero C code. Every feature enterprises need, with memory safety guarantees that C/C++ cannot provide.

Tokio Async Runtime<br>All I/O runs on the Tokio runtime, completely outside Python's GIL. True async without contention — the #1 source of tail latency in grpcio is gone.

Memory Safe by Design<br>Rust's ownership model eliminates use-after-free, buffer overflows, and data races at compile time. No more memory leaks under load. Security audited.

xDS Service Mesh<br>Full xDS support — LDS, RDS, CDS, EDS — for proxyless gRPC. Connect directly to your control plane. No sidecar proxy overhead.

TLS / mTLS via rustls<br>Modern TLS without OpenSSL. No compilation headaches, no dependency conflicts. Mutual TLS for zero-trust architectures.

Intelligent Load Balancing<br>Round-robin, ring-hash, weighted round-robin, outlier detection — all built in. ORCA load reporting for advanced traffic management.

All 4 Streaming Modes<br>Unary, server streaming, client streaming, bidirectional — all fully async through Tokio. Flow control and backpressure handled in Rust.

Drop-in Compatible<br>Same grpc Python API. Change one import line. Your existing protobuf definitions, handlers, and interceptors work unchanged.

Zero C Compilation<br>Pure Python wheel. pip install grpyc takes seconds, not minutes. No build tools, no C compiler needed.

Unlock performance on your workload<br>Whether you're serving models, running on Google Cloud, or scaling your service mesh — grpyc removes the bottleneck.

AI & ML Inference<br>Ship faster inference — Model serving frameworks (vLLM, Triton, TensorFlow Serving) rely on gRPC between clients and inference workers. grpyc's Tokio runtime eliminates the gRPC overhead that wastes expensive GPU time.<br>Tighter latency → fewer retries → more effective GPU utilization<br>Lower memory per connection → more room for model weights<br>Streaming support for token-by-token generation

Google Cloud<br>Accelerate every GCP API call — BigQuery, Pub/Sub, Spanner, Firestore — the Google Cloud Python SDK uses gRPC for every call. grpyc replaces that transport layer with Rust-powered performance, zero code changes.<br>Higher throughput for BigQuery Storage API reads<br>Faster Pub/Sub publish and subscribe<br>Predictable Spanner latency for real-time apps

Service Mesh at Scale<br>Scale your service mesh without limits — In a mesh with hundreds of Python gRPC services, one slow service cascades into timeouts across the network. grpyc's predictable latency breaks the cascade chain.<br>Proxyless gRPC via built-in xDS — eliminate sidecar overhead<br>Tight P50–P99 spread keeps timeouts meaningful<br>Built-in load balancing, health checking, outlier detection

The numbers that prove it<br>GKE benchmarks with a Go client. Same protobuf definitions, same handler logic. grpyc vs grpcio vs REST.

Async Unary QPS (Java server) GKE c2-standard-8 — higher is better<br>grpyc 60,263

grpcio 14,595

Unary Latency (P50) GKE ping-pong — lower is better<br>grpyc 208&mu;s

grpcio 451&mu;s

Cross-node QPS (c=64) GKE e2-standard-4 — higher is better<br>grpyc 10,145

FastAPI 5,261

grpcio 3,279

vLLM...

grpyc python rust grpc grpcio latency

Related Articles