Gaussian Splatting for Dummies | Darshan Makwana
Gaussian Splatting is a fascinating scene reconstruction technique introduced by INRIA and last year I had a lot of fun tinkering with it while on my semex. I recently discovered some of my notes related to it and decided to digitize it this weekend, along the way I reimplemented the forward rasterization pass in rust and decided it would be fun to write a tutorial explaining gaussian splatting to everyone, so here it is
what is a gaussian splat?
a 3D Gaussian splat is an oriented ellipsoid in space that carries some color and opacity. you can think of it as a fuzzy colored blob. a scene is made of hundreds of thousands of these blobs, and when you look at them from a particular viewpoint, they overlap and blend to form the final image
We represent each gaussian with these attributes:
pub struct Splat {<br>pub pos: Vec3, // center position in world space<br>pub scale: Vec3, // size along each local axis<br>pub rot: Quat, // orientation as a unit quaternion<br>pub color: Vec3, // RGB color (already decoded from spherical harmonics)<br>pub opacity: f32, // how opaque this blob is, in [0, 1]
scales are stored in log space, opacities as logits, colors as spherical harmonics coefficients and quaternions are normzlied to unit length to ensure the values lie within their respective range
spherical harmonic (SH) coefficients are just a frequency-domain representation of a color function defined over the unit sphere, now why spherical harmonics? because in the real world, the color of a surface depends on the viewing direction. SH coefficients encode this view-dependent appearance compactly.
SH functions are organized in bands (like octaves in music), as you go higher up in the bands you have more coefficients and thus they capture more finer details, the INRIA 3DGS format stores up to band 3 (48 coefficients per splat for RGB)
To decode bash 0, the band-0 SH basis function is $Y_0^0 = \frac{1}{2\sqrt{\pi}} \approx 0.282$. the conversion from SH coefficient to RGB is:
\[\text{color} = \text{clamp}\left(0.5 + C_0 \cdot f_{dc},\ 0,\ 1\right)\]
where $C_0 = Y_0^0$ and $f_{dc}$ is the 3-component DC coefficient from the file.
pub const SH_C0: f32 = 0.28209479177387814;
pub fn sh_band0_to_rgb(f_dc: Vec3) -> Vec3 {<br>(Vec3::splat(0.5) + SH_C0 * f_dc).clamp(Vec3::ZERO, Vec3::ONE)
the forward pass pipeline
the forward pass turns a list of 3D Gaussians + a camera into a 2D image. here is an overview of the rendering pipeline:
Step 1: Projecting Splats
1.1: building the 3D covariance matrix
for each splat given the raw (scale, rotation) pairs we need to construct a 3D covariance matrix $\Sigma$ that describes the shape and orientation of the Gaussian in world space. the formula is:
\[\Sigma = R \cdot S \cdot S^T \cdot R^T\]
where R is the 3×3 rotation matrix from the quaternion, and S is a diagonal matrix of scales. if we let M = R·S, this simplifies to:
\[\Sigma = M \cdot M^T\]
let r_mat = Mat3::from_quat(s.rot);<br>let s_mat = Mat3::from_diagonal(s.scale);<br>let m = r_mat * s_mat;<br>let cov3d = m * m.transpose();
Note: why dowe decompose the covariance this way?
Covariance matrices have physical meaning only when they are positive semi-definite . gradient descent cannot easily be constrained to produce valid matrices, by expressing the covariance as $M \cdot M^T$, it is guaranteed to be positive semi-definite, a matrix of the form $A^T A$ always is. this is a reparametrization trick: we optimize scale and rotation separately, which are unconstrained, and the covariance we derive from them is always valid
what does this matrix actually look like? for a splat with scale = (0.1, 0.05, 0.02) and identity rotation:
\[\Sigma =<br>\begin{pmatrix}<br>0.01 & 0 & 0 \\<br>0 & 0.0025 & 0 \\<br>0 & 0 & 0.0004<br>\end{pmatrix}<br>\quad<br>\begin{aligned}<br>&= \text{diag}(0.1^2,\; 0.05^2,\; 0.02^2)<br>\end{aligned}\]
with identity rotation, it is just the squared scales on the diagonal, an axis-aligned ellipsoid
1.2: transforming into view space
the 3D covariance we just computed lives in world space. to project it onto the camera’s image plane, we first need to rotate it into view space, the coordinate system where the camera is at the origin, looking down −z
for the splat center, this is just a matrix-vector multiply with the 4×4 view matrix:
let p_view4 = view * Vec4::new(s.pos.x, s.pos.y, s.pos.z, 1.0);<br>let p_view = Vec3::new(p_view4.x, p_view4.y, p_view4.z);<br>if p_view.z > -znear || p_view.z -zfar {<br>return None;<br>let zc = -p_view.z;
note zc = -p_view.z. our view space is right-handed with the camera looking down −z , so points in front of the camera have negative z. we use zc (positive in front) as the depth for sorting and projection.
for the covariance, we rotate it by the 3×3 part of the view matrix W:
\[\Sigma_{view} = W \cdot \Sigma \cdot W^T\]
let w_mat = Mat3::from_mat4(view);<br>let w_mat_t = w_mat.transpose();
let cov3d_view = w_mat * cov3d * w_mat_t;
this is just the standard basis-change...