Foveon — turn a Bayer photo into a Foveon X3 photo
Foveon
A neural sensor translator. Takes a photo from any<br>Bayer-array camera and renders it as if it were shot on a Sigma DP2<br>Merrill — the Foveon X3 stacked-sensor look, with<br>the colour and microdetail Foveon is famous for, on hardware you already<br>own.
Under the hood: a modified U-Net with an extra layer injected<br>between the encoder bottleneck and the upsampling decoder. The injected<br>channel carries a one-dimensional encoding of three-layer pixel-stack<br>structure — the B·G·R photodiode column that a Foveon<br>sensor captures and a Bayer sensor can’t. Trained end-to-end against<br>matched Bayer → Merrill scene pairs.
U-Net+1D
Modified U-Net<br>with 3-layer pixel<br>injection at bottleneck
Bayer → X3
Bayer CFA in,<br>Foveon X3 stacked-<br>sensor look out
DP2 Merrill
Trained on matched<br>scene pairs against<br>Sigma DP2 Merrill
⤓ Download Foveon.dmg
macOS 13+ · Apple Silicon
33 MB · signed DMG installer · unverified-developer<br>gatekeeper: right-click Open the first time
Foveon — macOS app. Choose a photo, drag the sliders, save the result.
What it is
Most digital cameras capture colour through a Bayer colour filter<br>array : each photosite sees only one of R, G, or B, and the other<br>two channels are interpolated from the neighbours (demosaiced). It’s<br>efficient, but it costs you. The interpolation introduces colour fringing<br>on sharp edges, smears fine detail, and produces the “digital”<br>micro-contrast that even high-end Bayer cameras can’t fully shake.
The Foveon X3 sensor — Sigma’s now-rare design<br>used in the DP1, DP2, and DP3 Merrill cameras — works the way colour<br>film does. Three photodiode layers are stacked vertically at every single<br>pixel position. The top layer absorbs blue, the middle layer green, the<br>bottom layer red. Every pixel captures the full colour. No interpolation,<br>no demosaicing artefacts, no false detail. The result is the<br>“Foveon look”: extraordinary microdetail and a particular colour<br>rendition — warm, dimensional, almost slide-film — that people<br>build entire camera systems around.
Foveon (the app) is a neural network that learns the<br>mapping between the two. Feed it a JPEG or RAW from a normal Bayer camera<br>(phone, mirrorless, DSLR) and it predicts what the same scene would look<br>like shot on a Foveon X3 sensor. Geometry stays the same; colour,<br>tonality, and micro-detail rendering shift toward the Merrill side of<br>the training distribution.
Bayer vs Foveon — the structural problem
Bayer CFA
One colour per pixel
G ×2
Each photosite captures exactly one colour. The other two<br>channels are guessed from the neighbours. The guess is what<br>creates the “digital” signature.
Foveon X3
Three layers per pixel
red (bottom)<br>green (middle)<br>blue (top)<br>stacked photodiodes, one per pixel
Every pixel records R, G, and B separately at the same<br>location. No interpolation. No false colour. The dimensional<br>quality Merrill shooters chase.
The architecture
The core is a standard convolutional U-Net : an encoder<br>that downsamples the input image into a compact feature bottleneck,<br>paired with a decoder that upsamples back to full resolution, with<br>skip connections at every level so fine spatial detail survives<br>the trip through the bottleneck.
The modification is a single new layer dropped in between the encoder’s<br>final downsampling block and the decoder’s first upsampling block: a<br>1D pixel-stack injection layer that concatenates a<br>one-dimensional encoding of how colour absorbs through silicon depth on<br>a real Foveon sensor — blue first, then green, then red. The<br>decoder learns to use this prior to reconstruct the kind of inter-channel<br>coupling that real X3 captures exhibit — chroma that’s registered<br>with luminance instead of interpolated against it.
U-Net architecture diagram. Encoder ENC1 (C=64) → ENC2 (C=128) →<br>ENC3 (C=256) → ENC4 (C=512) → Bottleneck (C=1024) → 1D Pixel-Stack<br>Injection (B·G·R depth prior) → DEC4 (C=512) → DEC3 (C=256) →<br>DEC2 (C=128) → DEC1 (C=64) → output, with skip connections from each<br>encoder level to the matching decoder level.
Encoder (blue) downsamples the Bayer input. The<br>1D injection layer (orange) concatenates the Foveon<br>B·G·R depth prior at the bottleneck. Decoder<br>(purple) upsamples back to full resolution. Skip connections<br>(dashed) carry pre-bottleneck spatial detail across to the matching decoder<br>level — standard U-Net, drawn here for completeness. The novel piece<br>is the orange block.
Why inject at the bottleneck
The encoder has just stripped spatial resolution to focus on<br>semantic content; the decoder is about to reconstruct it back.<br>That’s exactly the moment to inject the prior that says<br>“reconstruct as if the sensor were stacked, not mosaiced.”<br>Inject earlier and the encoder learns to ignore it; inject later<br>and the decoder has already committed to a demosaic-style...