Can Java Microservices Be As Fast As Go? A 2026 Benchmark Update | by Mark Nelson | Helidon | Jun, 2026 | MediumSitemapOpen in appSign up<br>Sign in
Medium Logo
Get app<br>Write
Search
Sign up<br>Sign in
Helidon
The official project Helidon blog containing articles from Helidon developers and the developers community. All articles are approved by the Helidon team.
Can Java Microservices Be As Fast As Go? A 2026 Benchmark Update
Mark Nelson
9 min read·<br>Jun 8, 2026
Listen
Share
Six years ago, Peter Nagy and I asked a question that was simple enough to be fun and annoying enough to be useful: can Java microservices be as fast as Go microservices? It was not meant to be a language war, because those are usually subjective anyway, and worse, they tend to make people less curious. The practical question was smaller: if you take a small HTTP service, implement it carefully in Go and Java, and run it on the same hardware, do the results land in the same performance neighborhood?<br>In 2020, the answer was yes for the small case. The shape I remembered, and wanted to test again, was that Java became more interesting as the workload and the machine got larger. So the 2026 question is not “did Go lose?” or “did Java solve all computer problems and also fold the laundry?” It is this: for this service, on this machine, with current runtimes, what happens as payload size and concurrency go up?<br>The companion repository for this article is markxnelson/go-java-go-2026. It includes the service code, benchmark scripts, raw results, summary tables, and chart-generation script.<br>The Baseline<br>For this run I used:<br>Go 1.26.3<br>Oracle JDK 26.0.1<br>Helidon SE 4.4.1<br>Linux on x86_64<br>Intel Xeon W-11855M, 6 cores / 12 threads<br>128 GiB RAM<br>The Go service uses the standard library net/http server. No framework. No middleware stack.<br>The Java service uses Helidon SE WebServer. Helidon 4 uses virtual threads for request handling, and the service health endpoint confirmed that request work was running on virtual threads.<br>For the Java side I measured two runtime shapes:<br>Oracle JDK JVM<br>Oracle JDK with a Leyden AOT cache<br>That is enough for this pass. It keeps the article focused on the question I actually measured: a compact Go service against a compact Java service, both running sequentially on the same local machine.<br>The Service<br>Both services expose the same endpoints:<br>GET /health<br>GET /ready<br>GET /api/strings/{value}<br>GET /api/generated/{size}The strings endpoint is useful for simple functional checks. The generated endpoint is the one I used for the benchmark matrix.<br>That distinction matters.<br>In an early run I tested a 2 KB input by putting a 2 KB string directly in the URL path. That mostly told me how each router handled a silly path parameter. Interesting, maybe, but not the thing I wanted to measure. The final full run uses /api/generated/{size} so the URL stays small and the application generates the requested input size inside the handler.<br>Each request does the same small unit of work:<br>uppercase the input<br>lowercase the input<br>reverse the input<br>compute a CRC32 hash<br>repeat extra CRC work according to WORK_FACTOR<br>return JSON with the result and runtime metadata<br>For the benchmark run, WORK_FACTOR=10. Request logging was off.<br>This is still a small synthetic service. It is not a shopping cart, a fraud system, or a payments API. It has no database, no TLS, no queue, no JSON parser on the inbound side, and no remote dependency. That is intentional. The point is to make the hot path small enough that runtime and server behavior are visible.<br>The Benchmark Shape<br>Please allow me to use the term “benchmark” a little loosely in this article.
The benchmark runner starts one service, runs the full matrix, stops it, and then starts the next service. Go and Java do not run at the same time, so they are not competing with each other for CPU or memory.<br>The benchmark run used:<br>payload sizes: 7, 128, 2048, 8192 bytes<br>concurrency levels: 1, 6, 12, 24, 48, 96, 192<br>repeats per cell: 2<br>warmup per cell: 2 seconds<br>measurement window: 5 seconds<br>work factor: 10The runtime settings were explicit:<br>Go:<br>GOMAXPROCS=12<br>GOMEMLIMIT=off
Java JVM variants:<br>-XX:ActiveProcessorCount=12<br>-XX:MaxRAMPercentage=75
With Leyden:<br>-XX:+UnlockDiagnosticVMOptions<br>-XX:-AOTRecordTraining<br>-XX:-AOTReplayTrainingThe combined result set used for the article lives under:<br>results/sequential_generated_leyden_feedback_full_20260608_0700432/It includes the raw per-cell summary table, the peak-throughput table, the pivot table used for the charts, and the runtime configuration table.<br>The Small Tuning Detail That Changed The Java Result<br>Before the benchmark run, I hit a strange result.<br>The Helidon service looked fine for tiny responses, but larger generated responses had a suspicious latency floor around 44–48 ms when the Go load driver reused persistent HTTP/1.1 connections. A fresh curl request did not show the same behavior after warmup. That smelled less like application code and more like packet...