The Next Frontier of Visual AI Is Code

ykhli1 pts0 comments

The Next Frontier of Visual AI Is Code | Andreessen Horowitz

Infra<br>The Next Frontier of Visual AI Is Code

Yoko Li

Posted<br>June 2, 2026

{ if(value) rect = $el.getBoundingClientRect(); });<br>" x-on:scroll.window="<br>const boundary = document.querySelector('.component-recommended--footer') || document.querySelector('.subscription-panel') || document.querySelector('footer');<br>const boundaryTop = boundary ? boundary.getBoundingClientRect().top : Infinity;<br>const buttonHeight = 48;

isSticky = (window.innerWidth >= 1024) && ($el.getBoundingClientRect().top = 1024) && ($el.getBoundingClientRect().top<br>Subscribe

Share

Share

Email

LinkedIn

Facebook

Hacker News

WhatsApp

Flipboard

Reddit

span]:rotate-180 [&.drop-active>div]:visible [&.drop-active>div]:opacity-100" :class="{'drop-active': isOpen}" x-data="toc()" x-show="navItems.length">

The Next Frontier of Visual AI Is Code<br>Table of Contents

Share

Share

Email

LinkedIn

Facebook

Hacker News

WhatsApp

Flipboard

Reddit

h1]:text-h1 [&>h1]:font-primary [&>h1]:mb-4 [&>h2]:text-h2 [&>h2]:font-primary [&>h2]:mt-[56px] [&>h2:first-of-type]:mt-14 [&>h2]:mb-4 [&>h3]:text-h3 [&>h3]:font-tertiary [&>h3]:mt-12 md:[&>h3]:mt-[56px] [&>h3:first-child]:mt-0 [&>h3]:mb-4 [&>h4]:text-h4 [&>h4]:mt-12 md:[&>h4]:mt-[56px] [&>h4:first-child]:mt-0 [&>h4]:font-medium [&>h4]:mb-4 [&>h5]:text-h5 [&>h5]:mt-[56px] [&>h5:first-child]:mt-0 [&>h5]:font-medium [&>h5]:mb-4 [&_a:not([class])]:text-[--post-link-color] [&_a:not([class]):hover]:text-[--post-link-color-hover] [&_strong]:font-bold [&>p:first-of-type]:mt-4 [&_p]:mb-7 [&>ul_li]:pl-6 [&>ul]:mb-7 [&>ul_li+li]:mt-3 [&>ul_li]:relative [&>ul_li]:before:content-[''] [&>ul_li]:before:size-1 [&>ul_li]:before:rounded-full [&>ul_li]:before:bg-black [&>ul_li]:before:absolute [&>ul_li]:before:left-[10px] [&>ul_li]:before:top-[13px] [&_figure]:my-6 [&_figure_img]:w-full [&_figure_figcaption]:mt-2 [&_figure_figcaption]:text-caption [&_figure_figcaption]:text-truffle [&_figure_figcaption]:font-secondary [&_figure_figcaption]:flex max-md:[&_figure_figcaption]:flex-wrap [&_figure_figcaption]:gap-2 [&_figure_figcaption]:justify-between [&_figure_figcaption]:italic [&_figure_figcaption_a]:not-italic [&>h5+ol]:-mt-2 [&_ol]:list-decimal [&_ol]:list-inside [&_ol]:my-4 [&_ol]:ml-4 [&_ol>li+li]:mt-2 [&>*:first-child]:mt-0 [&_.wp-caption-text]:italic [&_.wp-caption-text]:text-[#727069] [&_.wp-caption]:my-6 [&_iframe]:w-full [&_iframe]:mb-4 [&>strong+p]:mt-4 [&>blockquote]:py-6 [&>blockquote_p]:m-0 [&>blockquote_p]:italic [&>blockquote_p]:text-h3 [&>blockquote]:relative [&>blockquote]:pl-[26px] [&>blockquote_p]:before:content-[''] [&>blockquote_p]:before:bg-quote [&>blockquote_p]:before:bg-[length:24px_24px] [&>blockquote_p]:before:size-6 [&>blockquote_p]:before:absolute [&>blockquote_p]:before:left-0 [&>blockquote_p]:before:top-[2px]">

For the last few years, visual AI has mostly been judged by its pixels . The better the final image or video looked, the better the model seemed.

That made sense. Diffusion models turned text prompts into beautiful images, then videos, then increasingly realistic worlds. The obvious comparison point was Photoshop or a camera.

But for many visual-related tasks, like graphics design, UI design, or 3D modeling, the end representation users look for is not limited to the end state pixels. Instead, they are looking for artifacts where they can continuously iterate based on feedback and new ideas. A designer does not just need a mockup; they need layers, components, and handoff. An animator does not just need a video; they need timing curves, keyframes, and editable motion. A 3D artist does not just need a rendered picture; they need geometry, materials, lighting, cameras, and scene structure.

The most interesting visual AI tools today have stopped trying to generate the final output. Instead, they&rsquo;re generating the source code behind it. This change is unlocking editability, iteration, and a feedback loop that pixel-native models can&rsquo;t match.&rdquo;

The two stacks of visual generation

There are two major ways to think about visual generation.

The first is pixel-native generation . These systems generate images or videos directly, usually in latent space. They are great at texture, atmosphere, lighting, and realism. If the goal is to generate a cinematic shot, a beautiful moodboard, or a photorealistic image, diffusion models are still the dominant method.

The second is code-native generation . These systems generate a representation that is then executed or rendered by another engine. The model does not directly produce the final pixels; it produces the program that produces the pixels.

That program might be an SVG file, an HTML/CSS layout, a React component, a Lottie JSON file, a Blender script, a USD scene graph, a shader, or a game-engine scene. The visual output is still pixels at the end, but the source of truth is a structured representation.

This distinction matters because production workflows...

text before visual ul_li blockquote_p _figure_figcaption

Related Articles