Computing Camera Rays

Published 2026-06-20

We are in a transition period, where ray tracing and rasterization coexist. Typical real-time rendering pipelines still use rasterization (often with deferred shading) for primary visibility and then trace rays for shadows, reflections or global illumination. Though, we are getting to the point where one can seriously consider to ditch rasterization and to use ray tracing for primary visibility as well, at least on some platforms. Then you need to compute camera rays, characterized by ray origin, ray direction and ray length, in a way that is consistent with what you would otherwise do for rasterization. In rasterization, you commonly use a world to clip space transformation matrix, also known as view-projection matrix to specify the camera. In this blog post, I will derive how to compute camera rays based on such a matrix.

The goal is to get something that works equally well for perspective and orthographic projection and whatever else such a matrix may represent. The obvious approach turns out to be prone to numerical cancellation and I present an alternative that works much more reliably. Overall, this is not a hard problem: For any given camera model (e.g. perspective projection with known field of view), it is quite easy to come up with an ad hoc solution that will work. Though, I find it valuable to have a solution based on readily available transformation matrices that does not require any further tinkering per camera model. If you do not care about the derivation, feel free to just copy the shader code in Listing 4 (which I hereby release into the public domain, like all code in this blog post).

Camera rays in clip space

The rasterization pipeline and clip space rely heavily on homogeneous coordinates, so let us begin by reviewing that. If we have a point with 3D Cartesian coordinates \((x^\prime, y^\prime, z^\prime)^\mathsf{T}\), we can get homogeneous coordinates for that point by simply attaching a 1 as fourth coordinate: \((x^\prime,y^\prime,z^\prime,1)^\mathsf{T}\). This still describes a 3D point, but we have now gained the freedom to scale its coordinates by any non-zero factor \(w\neq 0\), which gives us

\[(x,y,z,w)^\mathsf{T} = (wx^\prime, wy^\prime, wz^\prime, w)^\mathsf{T}\text{.}\]

No matter how we choose \(w\), these coordinates still describe the same point. We can recover the inhomogeneous coordinates (i.e. dehomogenize) by dividing by the fourth component \(w\):

\[\frac{1}w(x,y,z,w)^\mathsf{T} = \left(\frac{x}{w}, \frac{y}{w}, \frac{z}{w}, 1\right)^\mathsf{T} = (x^\prime,y^\prime,z^\prime,1)^\mathsf{T}\text{.}\]

Homogeneous coordinates make many formulas simpler. For example, we will see below that we can also write down homogeneous coordinates for planes in 3D space, and then to check whether a point is on a plane, we just take a dot product. They also allow us to express translation with \(4\times 4\) matrices. Furthermore, they are useful for rasterization with a perspective projection. A perspective projection inherently requires us to perform a division at some point. With homogeneous coordinates, this division happens shortly before rasterization and it is simply the dehomogenization mentioned above.

Along with this design of rasterizers comes the notion of clip space. For screen space, we use a coordinate frame where coordinates \(x^\prime_c\) and \(y^\prime_c\) range from -1 to 1 across the extent of the camera frustum (the subscript \(c\) stands for clip space). In homogeneous coordinates, these bounds translate to \(-w_c\leq x_c\leq w_c\) and \(-w_c\leq y_c\leq w_c\). In addition, we define a near and far clipping plane based on the clip space z-coordinate. The far clipping plane is at \(z^\prime_c=1\), which translates to \(z_c\leq w_c\). The near clipping plane is defined differently, depending on the API: For Direct3D, the inequality is \(0\leq z_c\). For OpenGL, the default behavior is that the near clipping plane is at \(-w_c\leq z_c\), but this has been made configurable through the extension GL_ARB_clip_control, which lets you choose the Direct3D behavior of \(0\leq z_c\). This extension has moved into core functionality in OpenGl 4.5. For Vulkan, the behavior is similarly configurable. To account for these differences, I will use a variable \(z^\prime_n\) which is \(0\) for the Direct3D conventions and \(-1\) for the old OpenGL default, i.e. either way the near clipping plane is \(z^\prime_n w_c\leq z_c\).

A typical renderer will use lots of other coordinate frames, such as camera space and object space for every single object, but the only other one that we care about here is world space, because we want to get the ray in world space. In the context of ray tracing, it is pretty easy to define what we mean by world space: It is whatever the top-level acceleration structure uses. For rasterization, we then prepare the world to clip space matrix \(M_{w,c}\in\mathbb{R}^{4\times4}\) that...

Computing Camera Rays

Related Articles

Apple WWDC 2026 Livestream

Claude Fable 5

US Government directive to suspend access to Fable 5 and Mythos 5

Is AI ruining our skills? Early results are in – and they're not good

German ruling declares Google liable for false answers in AI Overviews