Antigravity 2.0 Tops the OpenSCAD Architectural 3D LLM Benchmark

OpenSCAD LLM Benchmark: Building the Pantheon | ModelRift Blog

Pricing

Blog

Changelog

Models

Feedback?

Open Editor

We ran a small practical benchmark: give several AI coding tools the same kind of task and ask them to build the Pantheon in OpenSCAD.

ModelRift generates OpenSCAD for every 3D model on the platform. The LLM’s ability to handle spatial geometry directly affects what we can ship, so we track how models improve on this kind of task.

The goal was to see how well each system could turn architectural reference material into parametric CAD code, using the OpenSCAD CLI to render previews and iterate.

The prompt was intentionally visual and architectural: build the Pantheon from reference images, including the rotunda, dome, portico, columns, pediment, and recognizable front details.

Overview of the six current benchmark results. Each thumbnail is labeled with the client and model used for that run.

Why Pantheon?

This was not a basic OpenSCAD syntax test. All of the current coding LLMs can produce a simple “cube with a hole” model in OpenSCAD perfectly well. That kind of prompt mostly tests whether the model knows difference(), cube(), and cylinder().

The Pantheon is more useful as a benchmark because it sits in a middle ground. OpenSCAD is not a good fit for natural sculpted models, organic surfaces, or character-like geometry. It is much better at Boolean operations, radial symmetry, extrusions, and clean constructive shapes. The Pantheon has a large radial rotunda and dome, a central oculus, straight portico faces, columns, stepped bases, and a triangular pediment. That mix makes it illustrative without being impossible.

It is also recognizable. A weak result still looks vaguely like a domed building, but a better result has to get the relationship between the round drum, the rectangular portico, the dome rings, and the front facade roughly right.

Prompt

The prompt used for the benchmark was:

see two ref images and build .scad file with openscad implementation of pantheon. use openscad CLI (available) to preview your work (by rendering openscad model to .png) and iterate until you are happy with the result. Reference Images

Reference #1 is the front facade view on the left. Reference #2 is the aerial/top view on the right. The combined image was generated with ffmpeg from the two source images used in the benchmark.

Results

Tool and modelTimeQualitySummaryLinkCursor 3.5 / Composer 2.5●●●●● 5/5, fastest●○○○○ 1.4/5Quickest run, but the weakest output. It captured a dome and portico, but the proportions, color discipline, and architectural details were the poorest.Explore 3D resultCodex 5.5 High●●●●○ 4/5, baseline●●●○○ 3.0/5Strong detail density, including the inscription on the entablature. The main issue was a mismatch between preview renders and the final STL.Explore 3D resultClaude Code 2.1 / Opus 4.7●●○○○ 2/5, slower●●●○○ 3.0/5Better structure than Cursor, with a clearer portico and stepped base, but too monochrome and less convincing than the stronger runs.Explore 3D resultClaude Code 2.1 / Sonnet 4.6●○○○○ 1/5, slowest●●●◐○ 3.4/5The model had clean massing, balanced proportions, and the most plausible overall read among the original autonomous batch, but took the longest to implement.Explore 3D resultGoogle Antigravity 2.0 / Gemini 3.5 Flash High Best autonomous result ●○○○○ 1/5, around 12 min●●●●◐ 4.5/5 Strongest autonomous output. It used real Pantheon dimensions, included the inscription, and was the only agent to implement the signature interior coffered ceiling pattern.Explore 3D resultModelRift / Gemini Flash 3.0 Human-in-the-loop winner ●○○○○ 1/5, about 10 min●●●◐○ 3.8/5 Best non-autonomous result. It used ModelRift’s iterative annotation workflow with Gemini Flash 3.0 and took about 2x the Claude Code time.Explore 3D result The scores are relative to this benchmark only. They are not general model rankings, and the time score reflects observed implementation time, not project publication timestamps. The quality scores are intentionally conservative: even the best result is not close to a perfect Pantheon model.

Workflow Notes

The client workflow mattered almost as much as the model. Codex Desktop shows the images that the LLM has loaded into context directly inside the conversation. For visual CAD work, that is very convenient: you can see whether the agent is actually using the same references you intended. Cursor Agent and Claude Code CLI were workable, but their process views made visual context less explicit.

All tested systems handled the local OpenSCAD toolchain well. OpenSCAD was installed on the test Mac and available on PATH, and every agent used it successfully to render PNG previews during iteration. The limiting factor was not tool access. It was geometric judgment, camera setup, and whether a previewed model exported into a clean final mesh.

Codex also made the preview iteration easier to follow. It exposed the reference images,...

Antigravity 2.0 Tops the OpenSCAD Architectural 3D LLM Benchmark

Related Articles

Amazon, Facebook, FBI have access to a private intelligence-sharing network

SpaceX not the behemoth everyone thought

Elevated error rates on requests to multiple models

Donald Trump and sons to be 'forever' exempt from tax audits

PopuLoRA: Co-Evolving LLM Populations for Reasoning Self- Play