Foveon – Bayer to Foveon X3, learned, Mac App using deep learning

coolwulf2 pts1 comments

Foveon — turn a Bayer photo into a Foveon X3 photo

Foveon

A neural sensor translator. Takes a photo from any<br>Bayer-array camera and renders it as if it were shot on a Sigma DP2<br>Merrill — the Foveon X3 stacked-sensor look, with<br>the colour and microdetail Foveon is famous for, on hardware you already<br>own.

Under the hood: a modified U-Net with an extra layer injected<br>between the encoder bottleneck and the upsampling decoder. The injected<br>channel carries a one-dimensional encoding of three-layer pixel-stack<br>structure — the B&middot;G&middot;R photodiode column that a Foveon<br>sensor captures and a Bayer sensor can&rsquo;t. Trained end-to-end against<br>matched Bayer &rarr; Merrill scene pairs.

U-Net+1D

Modified U-Net<br>with 3-layer pixel<br>injection at bottleneck

Bayer &rarr; X3

Bayer CFA in,<br>Foveon X3 stacked-<br>sensor look out

DP2 Merrill

Trained on matched<br>scene pairs against<br>Sigma DP2 Merrill

⤓ Download Foveon.dmg

macOS 13+ &middot; Apple Silicon

33 MB &middot; signed DMG installer &middot; unverified-developer<br>gatekeeper: right-click Open the first time

Foveon — macOS app. Choose a photo, drag the sliders, save the result.

What it is

Most digital cameras capture colour through a Bayer colour filter<br>array : each photosite sees only one of R, G, or B, and the other<br>two channels are interpolated from the neighbours (demosaiced). It&rsquo;s<br>efficient, but it costs you. The interpolation introduces colour fringing<br>on sharp edges, smears fine detail, and produces the &ldquo;digital&rdquo;<br>micro-contrast that even high-end Bayer cameras can&rsquo;t fully shake.

The Foveon X3 sensor — Sigma&rsquo;s now-rare design<br>used in the DP1, DP2, and DP3 Merrill cameras — works the way colour<br>film does. Three photodiode layers are stacked vertically at every single<br>pixel position. The top layer absorbs blue, the middle layer green, the<br>bottom layer red. Every pixel captures the full colour. No interpolation,<br>no demosaicing artefacts, no false detail. The result is the<br>&ldquo;Foveon look&rdquo;: extraordinary microdetail and a particular colour<br>rendition — warm, dimensional, almost slide-film — that people<br>build entire camera systems around.

Foveon (the app) is a neural network that learns the<br>mapping between the two. Feed it a JPEG or RAW from a normal Bayer camera<br>(phone, mirrorless, DSLR) and it predicts what the same scene would look<br>like shot on a Foveon X3 sensor. Geometry stays the same; colour,<br>tonality, and micro-detail rendering shift toward the Merrill side of<br>the training distribution.

Bayer vs Foveon — the structural problem

Bayer CFA

One colour per pixel

G &times;2

Each photosite captures exactly one colour. The other two<br>channels are guessed from the neighbours. The guess is what<br>creates the &ldquo;digital&rdquo; signature.

Foveon X3

Three layers per pixel

red (bottom)<br>green (middle)<br>blue (top)<br>stacked photodiodes, one per pixel

Every pixel records R, G, and B separately at the same<br>location. No interpolation. No false colour. The dimensional<br>quality Merrill shooters chase.

The architecture

The core is a standard convolutional U-Net : an encoder<br>that downsamples the input image into a compact feature bottleneck,<br>paired with a decoder that upsamples back to full resolution, with<br>skip connections at every level so fine spatial detail survives<br>the trip through the bottleneck.

The modification is a single new layer dropped in between the encoder&rsquo;s<br>final downsampling block and the decoder&rsquo;s first upsampling block: a<br>1D pixel-stack injection layer that concatenates a<br>one-dimensional encoding of how colour absorbs through silicon depth on<br>a real Foveon sensor — blue first, then green, then red. The<br>decoder learns to use this prior to reconstruct the kind of inter-channel<br>coupling that real X3 captures exhibit — chroma that&rsquo;s registered<br>with luminance instead of interpolated against it.

U-Net architecture diagram. Encoder ENC1 (C=64) → ENC2 (C=128) →<br>ENC3 (C=256) → ENC4 (C=512) → Bottleneck (C=1024) → 1D Pixel-Stack<br>Injection (B·G·R depth prior) → DEC4 (C=512) → DEC3 (C=256) →<br>DEC2 (C=128) → DEC1 (C=64) → output, with skip connections from each<br>encoder level to the matching decoder level.

Encoder (blue) downsamples the Bayer input. The<br>1D injection layer (orange) concatenates the Foveon<br>B&middot;G&middot;R depth prior at the bottleneck. Decoder<br>(purple) upsamples back to full resolution. Skip connections<br>(dashed) carry pre-bottleneck spatial detail across to the matching decoder<br>level — standard U-Net, drawn here for completeness. The novel piece<br>is the orange block.

Why inject at the bottleneck

The encoder has just stripped spatial resolution to focus on<br>semantic content; the decoder is about to reconstruct it back.<br>That&rsquo;s exactly the moment to inject the prior that says<br>&ldquo;reconstruct as if the sensor were stacked, not mosaiced.&rdquo;<br>Inject earlier and the encoder learns to ignore it; inject later<br>and the decoder has already committed to a demosaic-style...

foveon bayer colour pixel sensor layer

Related Articles