Additive Blending on the Nintendo 64

ibobev1 pts0 comments

PhobosLab

Blog<br>Games<br>Projects

Dominic Szablewski, @phoboslab

— Monday, May 4th 2026

Additive Blending on the Nintendo 64

Did you ever wonder why explosions and other effects looked so much cooler on<br>the original PlayStation than they did on the Nintendo 64?

“Silent Bomber“ for the PSX

“Star Fox 64“ for the N64

The reason is additive blending! Or rather, in the N64 case, the lack thereof. While<br>the N64 actually did support additive blending, it was practically unusable.

PSX

The PSX supports 4 different blend modes (in addition to just overwriting<br>pixels) to control how sprites and geometry are mixed into the existing<br>frame buffer:

0: (src + dst) / 2<br>1: src + dst<br>2: dst - src<br>3: dst + src/4<br>The one you see here in Silent Bomber is conceptually the simplest one: src + dst.<br>That is, colors are just added to the existing ones in the frame buffer.

| R | G | B |<br>| src (sprite) | 171 | 42 | 226 |<br>| + dst (framebuffer) | 63 | 141 | 170 |<br>| = result | 234 | 183 | 255 |<br>Drawing a sprite over a scene can only ever make it brighter , never darker.<br>Perfect for explosions, plasma beams and magic spells. Importantly, note how<br>the B value in this example adds up to 396, but the PSX GPU helpfully clamps<br>it to the maximum range of 255.

(Aside: the PSX GPU actually only works in 16bit precision with 5bit color<br>components, so values range from 0 .. 31; the math is the same.)

N64

The N64's “Reality Display Processor” (the fixed-function rasterizer, short RDP)<br>has a much more flexible way to control blending: a configurable “Color Combiner”.<br>This is somewhat similar to OpenGL's glBlendFunc().

Libdragon exposes this functionality with the<br>RDPQ_BLENDER((P, A, Q, B)) macro that instructs the RDP to execute<br>(P * A) + (Q * B), where each slot can be one of several inputs.

Setting up additive blending with this is trivial:

RDPQ_BLENDER(( IN_RGB, IN_ALPHA, MEMORY_RGB, ONE ))<br>The problem is, the RDP doesn't clamp the result.

| R | G | B |<br>| src (sprite) | 171 | 42 | 226 |<br>| + dst (framebuffer) | 63 | 141 | 170 |<br>| = result | 234 | 183 | 140 |<br>wraps around!<br>The resulting output is less than desirable:

Now, you could of course fall back to draw such effects on the “Reality Signal<br>Processor” (RSP), the vector co-processor of the N64. But that gets complicated<br>quickly if you want to do rotation, scaling or any actual 3D stuff. The RDP<br>is much better suited for this. Displaying is its job!

While the RDP can draw into a 32bit buffer, it was very uncommon for games<br>to do so. Almost all N64 games used a 16bit framebuffer for the final output.<br>But with this in mind, I came up with a different plan:

Let the RDP draw onto a 32bit RGBA 8888 (8 bits per component) buffer, but have<br>all our sprites in the 16bit RGBA 5551 (5 bits per color component, 1 bit alpha)<br>range. I could just pre-process assets by dividing RGB by 8 (or right shifting<br>by 3 bits). This will essentially draw everything way too dark, but in turn<br>gives us lots of headroom for additive blending.

No wrap around when all additive blended sprites result in less than 255<br>Better yet, we don't have to do this image pre-processing offline. We can just<br>instruct the color combiner to do it for us when drawing. For free!

// Abuse the fog alpha value to draw all colors at 1/8th intensity<br>rdpq_set_fog_color(RGBA32(0, 0, 0, 256/8));<br>rdpq_mode_blender(RDPQ_BLENDER(( IN_RGB, FOG_ALPHA, MEMORY_RGB, ONE )));<br>So how do we get this back to normal brightness? Simple: use a 16bit frame buffer<br>for displaying and “copy” all the 32bit colors into it. We just have to be careful<br>to clamp all 8bit color components into the 5bit range.

void cpu_rgba_8888_to_5551(uint32_t *rgba32_in, uint16_t *rgba16_out) {<br>for (int i = 0; i 320 * 240; i++) {<br>color_t c = color_from_packed32(rgba32_in[i]);<br>if (c.r > 31) { c.r = 31; }<br>if (c.g > 31) { c.g = 31; }<br>if (c.b > 31) { c.b = 31; }<br>rgba16_out[i] = (c.r 11) | (c.g 6) | (c.b 1) | 0x1;<br>Doing this on the CPU is of course prohibitively expensive. It takes about 70ms<br>for a 320×240 frame. But this is where the RSP co-processor shines. The problem<br>now became simple enough.

The RSP's 128bit vector instructions can process 8 pixels at a time. With<br>some help from HailToDodongo on the #N64Brew discord optimizing the GPU microcode,<br>this now runs in about 3.1ms for the whole frame!

(Trivia: I'd like to interject for a moment. What is commonly referred to as<br>“GPU microcode” in the context of the N64 is in fact, MIPS/assembly that runs<br>on the RSP, or as I've recently taken to calling it, MIPS plus assembly.)

Modern tooling for N64 development is phenomenal. While it helps to have<br>some understanding of assembly, you don't have to write MIPS assembly by hand<br>anymore. HailToDodongo invented a C-like language called RSPL<br>that directly compiles to it.

So the whole setup looks like this:

// Init the display with a 16bit frame buffer<br>display_init(RESOLUTION_320x240, DEPTH_16_BPP, 3, GAMMA_NONE, FILTERS_DISABLED);

// Create a secondary 32bit render buffer and set it...

additive blending buffer frame 16bit color

Related Articles