Benchmarking AI Gateways: GoModel vs. LiteLLM vs. Portkey vs. Bifrost

santiago-pl1 pts1 comments

Benchmarking AI Gateways: GoModel vs LiteLLM vs Portkey vs Bifrost | enterpilot Blog<br>← Back to all posts Benchmarking AI Gateways: GoModel vs LiteLLM vs Portkey vs Bifrost<br>June 26, 2026<br>· Jakub A. Wasek

In October 2025 I tried to build my startup on top of LiteLLM.

At first it looked like the obvious choice. It supported many providers, it had<br>an OpenAI-compatible API, and it was already used by a lot of people. I did not<br>want to write an AI gateway. I wanted to build the product behind it.

Then I started running it on the hot path.

My opinion changed there.

A gateway is not a dashboard or integration glue you call once in a while. It<br>sits on every request, every retry, every stream, every tool call, every<br>fallback, every timeout.

A heavy gateway charges rent forever.

Most AI gateway comparisons miss that part. They talk about provider count,<br>dashboards, tracing, and “support for 1000+ models”. Those things matter, but<br>they are not free. Before the gateway calls OpenAI, Anthropic, Gemini, vLLM, or<br>anything else, it has already spent your CPU, memory, cold-start time, and<br>operational budget.

I am not comparing full product maturity here. I am comparing how these gateways<br>behave on the hot path.

So I started writing GoModel: a small<br>open-source AI gateway and AI control plane in Go, with an OpenAI-compatible API<br>and explicit provider adapters.

When I launched GoModel on Hacker News,<br>I promised a real, reproducible benchmark. This article is that follow-up.

The benchmark question is simple:

How lean is each AI gateway when it sits on the request path?

That question runs through the whole benchmark: GoModel vs LiteLLM vs Portkey vs<br>Bifrost, measured by latency, throughput, memory, CPU, cold start, and image<br>size rather than landing pages or feature matrices.

The runtime footprint matters

Latency gets the easiest arguments. It rarely tells the whole story.

Most real LLM calls are dominated by inference time. If a model takes 2000 ms<br>to answer, the difference between 5 ms and 15 ms of proxy overhead is not<br>the main story.

The main story is the deployment envelope:

How much RAM does the gateway need under load?

How much CPU does it burn per request?

How many requests can it serve per core?

How fast does it cold-start?

How large is the Docker image?

Can you run it as a sidecar, on a small VM, in serverless, or near local<br>models?

Is the core gateway actually open-source?

Those numbers decide whether the gateway can run where you want it to run.

A 372 MB compressed image (1.2 GB unpacked) that idles around gigabytes of<br>RAM and takes 25 s to cold-start is a different operational thing than a<br>16 MB image that peaks at 37 MB of RAM and is serving traffic 0.56 s after<br>launch.

So I care about the runtime footprint.

What this benchmark does not prove

This benchmark does not prove that one gateway is best for every company.

I am not measuring:

bug counts or overall correctness

semantic cache quality

tracing UI quality

guardrail quality

admin dashboards

long-term provider maintenance

every possible provider-specific feature

total provider count

Those things matter. Some of them matter a lot.

LiteLLM in particular has more integrated providers and more gateway features<br>than GoModel today. If your first requirement is maximum provider coverage right<br>now, LiteLLM has a real advantage. This benchmark does not erase that. It<br>measures the runtime footprint of putting each gateway on the request path. In<br>practice, many smaller or newer providers already expose an OpenAI-compatible<br>API, so provider count is not always the same as practical routing coverage.

The benchmark measures one narrower thing: runtime and deployment overhead on<br>the request path .

That still matters, because the gateway is on the hot path. If you run high<br>request volume, local models, serverless workloads, edge workloads, or many small<br>model calls, the overhead stops being theoretical.

AI gateway benchmark setup

I tested four AI gateways people actually compare:

GoModel

LiteLLM

Portkey

Bifrost

Every gateway talked to the same instant mock backend , on purpose. I did not<br>want to benchmark OpenAI, Anthropic, AWS networking, or random internet jitter.<br>I wanted to isolate the gateway itself.

Each gateway ran one at a time, in Docker, on an AWS c7i.large with<br>2 vCPU and 4 GiB RAM, running the latest Amazon Linux 2023 AMI. The whole<br>thing is Terraform’d, runs with one command, and tears itself down afterwards.

I first ran this on a free-tier t2.micro. That was cheap and easy to<br>reproduce, but unfair to the heavier gateways. A 1 GiB machine cannot hold a<br>gateway that wants gigabytes of memory, so it starts swapping. At that point you<br>are benchmarking the host being too small.

So I moved to c7i.large: still small, but non-burstable and large enough that<br>nothing swaps. It also makes the LiteLLM setup more honest. LiteLLM recommends<br>one worker per vCPU, and this machine has 2 vCPUs, so LiteLLM gets 2<br>workers. That gives...

gateway litellm benchmark gomodel provider gateways

Related Articles