Investigating Linux graphics - Thomas Leonard's blog
I learn how to draw a triangle with a GPU, and then trace the code to find out how the graphics system works (or doesn't),<br>looking at Mesa3D, GLFW, OpenGL, Vulkan, Wayland and Linux DRM.
Table of Contents
Introduction
Overview
OpenGL
Vulkan
Synchronisation
First attempt at tracing
Removing GLFW
Removing Vulkan's Wayland extension
Wayland walk-through
Kernel details with bpftrace
Start-up and library loading
Enumerating devices
Setting up the pipeline
Rendering one frame
Re-examining the errors
Conclusions
Introduction
In the past, I've avoided graphics driver problems on Linux by using only Intel integrated graphics.<br>But, due to a poor choice of motherboard, I ended up needing a separate graphics card.<br>Now my computer takes 14s to resume from suspend and dmesg is spewing this kind of thing:
[59829.886009] [drm] Fence fallback timer expired on ring sdma0<br>[59830.390003] [drm] Fence fallback timer expired on ring sdma0<br>[59830.894002] [drm] Fence fallback timer expired on ring sdma0
[79622.739495] amdgpu 0000:01:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring comp_1.0.1 test failed (-110)<br>[79622.909019] amdgpu 0000:01:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring comp_1.0.2 test failed (-110)<br>[79623.075056] amdgpu 0000:01:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring comp_1.0.3 test failed (-110)<br>[79623.241971] amdgpu 0000:01:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring comp_1.0.4 test failed (-110)<br>[79623.408604] amdgpu 0000:01:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring comp_1.0.6 test failed (-110)
[80202.893020] [drm] scheduler comp_1.0.1 is not ready, skipping<br>[80202.893023] [drm] scheduler comp_1.0.2 is not ready, skipping<br>[80202.893024] [drm] scheduler comp_1.0.3 is not ready, skipping<br>[80202.893025] [drm] scheduler comp_1.0.4 is not ready, skipping<br>[80202.893025] [drm] scheduler comp_1.0.6 is not ready, skipping<br>[80202.936910] [drm] scheduler comp_1.0.1 is not ready, skipping
But what is a "fence" or an "sdma0" ring? What are these comp_ schedulers,<br>and why does Linux Oops when enough of them aren't ready?<br>And why does Firefox hang when playing videos since upgrading NixOS?<br>I thought it was time I learnt something about how Linux graphics is supposed to work...
Overview
To show something on the screen, we allocate a chunk of memory (called a framebuffer) to hold the colour of each pixel.<br>After calculating all the (millions of) colour values, we tell the display hardware the address of the framebuffer<br>and it sends all the values to the monitor for display.<br>While it's doing this, we can be rendering the next frame to another framebuffer.
Computers aren't well optimised for this kind of work, but a graphics card speeds things up.<br>A graphics card is like a second computer, with its own memory, processors and display hardware,<br>but optimised for graphics:
The main computer (host)
Typically has a small number of very fast processors.
A graphics card
Has a large number of relatively slow processors.
The graphics card architecture is useful because we can split the screen into many small tiles<br>and render them in parallel on different processors.<br>Running the processors relatively slowly saves energy (and heat), allowing us to have more of them.
Note: a GPU (Graphics Processing Unit) doesn't have to be on a separate card; it can also be part of the main computer<br>and use the main memory instead of dedicated RAM.
Usually we run multiple applications and have them share the screen.<br>Ideally, each application (e.g. Firefox) runs code on the GPU to render its window contents to GPU memory (1),<br>then shares the reference to that memory with the display server (2).<br>The display server (Sway in my case) runs more code on the GPU (3) to copy this window into the final image (4),<br>which the hardware sends to the screen (5):
Wayland desktop with a graphics card
Application processes send instructions to the GPU via a Linux kernel driver (amdgpu in my case).<br>Every GPU has its own API, and these are very low-level,<br>so applications generally use the Mesa library to provide an API that works across all devices.<br>Mesa has backends for all the different GPUs, and also a software-rendering fallback if there isn't a GPU available.
The Linux graphics stack in a nutshell has more explanations, but I wanted to try it out myself...
OpenGL
Mesa supports OpenGL, a cross-platform standard API for graphics.<br>However, you also need some platform-specific code to open a window and connect up a suitable backend.<br>After a bit of searching I found GLFW (Graphics Library FrameWork),<br>which has a nice tutorial showing how to draw a triangle into a window.
That worked, and I got a window with a colourful rotating triangle,<br>animating smoothly even fullscreen, while showing no CPU load<br>(use LIBGL_ALWAYS_SOFTWARE=true to compare with software rendering):
OpenGL triangle
It starts by setting everything...