grpyc: Fast gRPC replacement for Python, built in Rust

grpyc — Up to 8x Faster gRPC for Python | Rust Safety, Drop-in Compatible⚡"> Drop-in API-compatible gRPC for Python Up to 8x faster gRPC. Rust safety. Still Python. grpyc is a drop-in replacement for grpcio 1.80, built in Rust. Up to 8x throughput on GKE, 2x lower latency, zero memory leaks. Change one import. Talk to Engineering View Benchmarks

Up to 8x Faster than grpcio

2.2x Lower latency

3.4x QPS per core

1 line To migrate

Accelerate AI/ML Inference vLLM, Triton, and TensorFlow Serving rely on gRPC between clients and inference workers. Python's grpcio adds latency that wastes expensive GPU time. grpyc reduces it.

814 tok/s vLLM streaming throughput grpyc delivers tokens faster than REST or grpcio in simulated vLLM inference with 50-token responses.

2.2x Lower P50 latency Lower transport latency means requests reach the GPU faster, improving batch fill rates and overall GPU utilization.

Better batching Higher GPU efficiency Faster request delivery fills GPU batches more efficiently. Less time waiting on Python means more tokens per second per dollar.

See AI/ML Details →

Pure Rust. No C. No compromises. grpcio wraps the C core through Cython — inheriting memory leaks, GIL contention, and compilation hell. grpyc is a native Rust implementation: memory safe by default, zero C code anywhere in the stack.

Standard gRPC Python Slower Python API Cython Wrappers C Core Shim gRPC C Core OS / Network

grpyc 100% Rust Up to 8x Python API (compatible surface) PyO3 Rust ↔ Python h2 + rustls HTTP/2 + TLS, pure Rust Tokio async I/O OS / Network

Memory safe by default — Rust ownership eliminates entire vulnerability classes Minimal GIL contention — I/O and serialization in Rust, GIL released during network ops No C compilation — pure Python wheel, installs in seconds No memory leaks — impossible by design, not just tested

Rust-powered. Safe by default. A complete gRPC stack — 47 Rust modules, zero C code. Every feature enterprises need, with memory safety guarantees that C/C++ cannot provide.

Tokio Async Runtime All I/O runs on the Tokio runtime, completely outside Python's GIL. True async without contention — the #1 source of tail latency in grpcio is gone.

Memory Safe by Design Rust's ownership model eliminates use-after-free, buffer overflows, and data races at compile time. No more memory leaks under load. Security audited.

xDS Service Mesh Full xDS support — LDS, RDS, CDS, EDS — for proxyless gRPC. Connect directly to your control plane. No sidecar proxy overhead.

TLS / mTLS via rustls Modern TLS without OpenSSL. No compilation headaches, no dependency conflicts. Mutual TLS for zero-trust architectures.

Intelligent Load Balancing Round-robin, ring-hash, weighted round-robin, outlier detection — all built in. ORCA load reporting for advanced traffic management.

All 4 Streaming Modes Unary, server streaming, client streaming, bidirectional — all fully async through Tokio. Flow control and backpressure handled in Rust.

Drop-in Compatible Same grpc Python API. Change one import line. Your existing protobuf definitions, handlers, and interceptors work unchanged.

Zero C Compilation Pure Python wheel. pip install grpyc takes seconds, not minutes. No build tools, no C compiler needed.

Unlock performance on your workload Whether you're serving models, running on Google Cloud, or scaling your service mesh — grpyc removes the bottleneck.

AI & ML Inference Ship faster inference — Model serving frameworks (vLLM, Triton, TensorFlow Serving) rely on gRPC between clients and inference workers. grpyc's Tokio runtime eliminates the gRPC overhead that wastes expensive GPU time. Tighter latency → fewer retries → more effective GPU utilization Lower memory per connection → more room for model weights Streaming support for token-by-token generation

Google Cloud Accelerate every GCP API call — BigQuery, Pub/Sub, Spanner, Firestore — the Google Cloud Python SDK uses gRPC for every call. grpyc replaces that transport layer with Rust-powered performance, zero code changes. Higher throughput for BigQuery Storage API reads Faster Pub/Sub publish and subscribe Predictable Spanner latency for real-time apps

Service Mesh at Scale Scale your service mesh without limits — In a mesh with hundreds of Python gRPC services, one slow service cascades into timeouts across the network. grpyc's predictable latency breaks the cascade chain. Proxyless gRPC via built-in xDS — eliminate sidecar overhead Tight P50–P99 spread keeps timeouts meaningful Built-in load balancing, health checking, outlier detection

The numbers that prove it GKE benchmarks with a Go client. Same protobuf definitions, same handler logic. grpyc vs grpcio vs REST.

Async Unary QPS (Java server) GKE c2-standard-8 — higher is better grpyc 60,263

grpcio 14,595

Unary Latency (P50) GKE ping-pong — lower is better grpyc 208μs

grpcio 451μs

Cross-node QPS (c=64) GKE e2-standard-4 — higher is better grpyc 10,145

FastAPI 5,261

grpcio 3,279

vLLM...

grpyc: Fast gRPC replacement for Python, built in Rust

Related Articles

Amazon, Facebook, FBI have access to a private intelligence-sharing network

Show HN: GoPeek – open links in live mini browser windows without new tabs

Agent Memory: An Anatomy

SpaceX not the behemoth everyone thought

The Mirror Is Part of the Machine