Measuring Input Latency with VK_EXT_present_timing

My side quest measuring input latency with VK_EXT_present_timing – Maister's Graphics Adventures

About This Site This is a place where I post random topics I find interesting in low-level graphics and engine programming.

ARNTZEN SOFTWARE AS

My contracting business

Archives Search

Search for:

There’s been two use cases I’ve been looking at recently where having an objective and accurate metric for input latency is important. One is the push for AMD_anti_lag support in Mesa (which I need to get around to reviewing) where we need solid objective data that it’s actually helping, and also for my streaming solution PyroFling, I want some hard objective data demonstrating where the milliseconds are going.

I’ve been working on lots of plumbing in this area recently to hopefully help the ecosystem. We shouldn’t need weird hardware solutions to do this stuff. With VK_EXT_present_timing now being plumbed through the Linux driver stack, we have the API we need to do comparative analysis without too much fluff.

The Vulkan layer

I added a new layer to PyroFling repo here. I documented how it works there, but the basic gist is to read back a small region of the swapchain and compare that to a previous frame to compute a Mean Square Error (MSE) metric. When this error spikes significantly compared to the previous N frames, we assume it happened due to input. This input is synthetically generated with /dev/uinput at somewhat random points in time. Using present timing we get accurate metrics for when that present flowed through the system and we can infer latency metrics based on when we generated synthetic input and when the different frame hit the screen.

While the layer is active, the center of screen shows an "error" image. Before starting a capture, this square should be mostly black. TAA jitter on a stable scene can be seen in this view too, and that’s fine for this layer. A small delta input is generated which should show up as large deltas. E.g. if I move the camera while taking a screenshot:

After a run, we can do analysis. This is a CPU bound game on my lopsided system with 9070xt and an old Zen2 CPU.

Typical CPU bound case

Analyzing /tmp/latency-measurement-wine64-preloader-2026-07-01-12-16-26.csv Average frame time: 9.21 ms (108.58 Hz)

PresentComplete is determined by stage: Dequeued Dequeued: Used on Xwayland. Does not exactly represent when image is flipped on screen, but rather when compositor commits to displaying the image. A few milliseconds are expected. FirstPixelOut: Used on most compositors. Represents when GPU flips image on display controller. FirstPixelVisible: Represents when photons are actually emitted by display. Not supported by any known implementation.

Gap between input stimulus and PresentComplete: Represents overall felt latency Average 22.1 ms (confirms that the game is quite responsive) Standard Deviation +/- 2.8595 ms Median 22.4 ms Range [16.7, 28.3] ms

Gap between input stimulus and GPU idle: Represents overall felt latency under ideal VRR conditions Average 21.7 ms Standard Deviation +/- 2.8383 ms Median 21.8 ms Range [16.3, 27.9] ms

Gap between input stimulus and QueuePresent: If this is large, we are likely CPU bound or application is buffering input a lot Average 14.8 ms (about 1.5 frames, as expected for CPU bound game) ~0.5 frames for input polling jitter and 1 frame for CPU commands Standard Deviation +/- 2.7076 ms Median 15.0 ms Range [9.49, 20.6] ms

Gap between QueuePresent and GPU idle: If this is large, we are likely GPU bound and would benefit from anti-lag Average 6.85 ms ( All numbers look just like I expect.

Heavily GPU bound case

To stress test anti-lag a bit, Cyberpunk 2077 with RT is a good candidate since it completely slams my GPU at native-res + heavy RT:

Some TAA instability comes through in the delta box. Without anti-lag, we see the culprit right away:

Average frame time: 16.809 ms (59.493 Hz)

Gap between input stimulus and PresentComplete: Represents overall felt latency Average 75.9 ms (yikes) Standard Deviation +/- 6.4658 ms Median 77.5 ms Range [65.8, 86.3] ms

Gap between QueuePresent and GPU idle: If this is large, we are likely GPU bound and would benefit from anti-lag Average 33.2 ms Standard Deviation +/- 0.9738 ms Median 33.1 ms Range [31.8, 36.1] ms A full 2 frames of GPU latency, which is bad. We submit work to the GPU long, long before it goes idle from previous frame, oversubscribing it massively. This is what Reflex/AntiLag attacks, adding delays such that we barely keep the GPU fully subscribed, but no more. With anti_lag it looks much better:

Gap between input stimulus and PresentComplete: Represents overall felt latency Average 50.0 ms (clawed back 25 ms latency, nice) Standard Deviation +/- 4.3821 ms Median 50.5 ms Range [43.4, 60.8] ms

Gap between QueuePresent and GPU idle: If this is large, we are likely GPU bound and would benefit from anti-lag Average 16.9 ms (about 1 frame is which what we...

Measuring Input Latency with VK_EXT_present_timing

Related Articles

(no title)

Is AI ruining our skills? Early results are in – and they're not good

The Anatomy of an AI-Native Org

ZCode – Harness for GLM-5.2

Apertus – Open Foundation Model for Sovereign AI