Step 3.7 Flash

Step 3.7 Flash — A high-efficiency Flash model for Real-World

Join us Try Step 3.7 Flash

2026-05-29

Step 3.7 Flash

The new frontier is agent efficiency.

A high-efficiency Flash model for real-world agents.

Multimodal Understanding & Action｜Web & Visual Search Enhancement｜Reliable Tool Use & Orchestration｜Agent Ecosystem Compatibility

GitHub

HuggingFace

ModelScope

Key Features

Native Multimodal Understanding & Acting

Understands images across the full range — product UIs, documents, charts, and natural scenes — then writes code or calls tools to act on what it sees.

Web & Visual Search Enhancement

Web search reaches further — more sources, deeper follow-up. Visual search recognizes what other systems don't — long-tail entities, freshly emerged concepts.

Reliable Tool Use & Orchestration

Drives terminals, browsers, Office tools, search, and beyond — staying coherent however long the run gets. Less drift, fewer broken toolcalls, fewer failed runs.

Agent Ecosystem Compatibility

Works with mainstream harnesses (Claude Code, KiloCode, Hermes Agent, OpenClaw) and Skills — lower integration cost, less workflow rewiring.

Agentic Coding

SWE-Bench Pro

S';"> 56.3

Step 3.7 Flash

Score: 56.3

Params: 196B

S';"> 51.3

Step 3.5 Flash

Score: 51.3

Params: 196B

D';"> 55.6

DeepSeek V4 Flash

Score: 55.6

Params: 284B

G';"> 55.1

Gemini 3.5 Flash

Score: 55.1

Params: Unknown

G';"> 58.6

GPT 5.5

Score: 58.6

Params: Unknown

O';"> 64.3

Claude Opus 4.7

Score: 64.3

Params: Unknown

Terminal-Bench 2.1

S';"> 59.5

Step 3.7 Flash

Score: 59.5

Params: 196B

S';"> 53.4

Step 3.5 Flash

Score: 53.4

Params: 196B

D';"> 62.0

DeepSeek V4 Flash

Score: 62.0

Params: 284B

G';"> 76.2

Gemini 3.5 Flash

Score: 76.2

Params: Unknown

G';"> 82.7

GPT 5.5

Score: 82.7

Params: Unknown

O';"> 69.4

Claude Opus 4.7

Score: 69.4

Params: Unknown

Multimodal

SimpleVQA (with Tool)

S';"> 79.2

Step 3.7 Flash

Score: 79.2

Params: 196B

G';"> 78.2

GLM 5V Turbo

Score: 78.2

Params: Unknown

K';"> 78.2

Kimi K2.6

Score: 78.2

Params: Unknown

G';"> 79.1

GPT 5.5

Score: 79.1

Params: Unknown

V* (with Python)

S';"> 95.3

Step 3.7 Flash

Score: 95.3

Params: 196B

G';"> 89.0

GLM 5V Turbo

Score: 89.0

Params: Unknown

K';"> 96.9

Kimi K2.6

Score: 96.9

Params: Unknown

G';"> 96.3

Gemini 3 Flash

Score: 96.3

Params: Unknown

General Agent

GDPval

S';"> 45.8

Step 3.7 Flash

Score: 45.8

Params: 196B

S';"> 28.0

Step 3.5 Flash

Score: 28.0

Params: 196B

D';"> 44.0

DeepSeek V4 Flash

Score: 44.0

Params: 284B

G';"> 57.8

Gemini 3.5 Flash

Score: 57.8

Params: Unknown

G';"> 63.0

GPT 5.5

Score: 63.0

Params: Unknown

O';"> 63.0

Claude Opus 4.7

Score: 63.0

Params: Unknown

Toolathlon

S';"> 49.5

Step 3.7 Flash

Score: 49.5

Params: 196B

S';"> 33.3

Step 3.5 Flash

Score: 33.3

Params: 196B

D';"> 52.8

DeepSeek V4 Flash

Score: 52.8

Params: 284B

G';"> 56.5

Gemini 3.5 Flash

Score: 56.5

Params: Unknown

G';"> 60.2

GPT 5.5

Score: 60.2

Params: Unknown

O';"> 65.4

Claude Opus 4.7

Score: 65.4

Params: Unknown

ClawEval-1.1 (2026-05-09)

S';"> 67.1

Step 3.7 Flash

Score: 67.1

Params: 196B

S';"> 43.6

Step 3.5 Flash

Score: 43.6

Params: 196B

D';"> 57.8

DeepSeek V4 Flash

Score: 57.8

Params: 284B

G';"> 57.8

Gemini 3.1 Pro

Score: 57.8

Params: Unknown

G';"> 60.3

GPT 5.4

Score: 60.3

Params: Unknown

O';"> 70.8

Claude Opus 4.6

Score: 70.8

Params: Unknown

HLE (with Tool)

S';"> 47.2

Step 3.7 Flash

Score: 47.2

Params: 196B

S';"> 35.7

Step 3.5 Flash

Score: 35.7

Params: 196B

D';"> 45.1

DeepSeek V4 Flash

Score: 45.1

Params: 284B

G';"> 40.2

Gemini 3.5 Flash

Score: 40.2

Params: Unknown

G';"> 52.2

GPT 5.5

Score: 52.2

Params: Unknown

O';"> 54.7

Claude Opus 4.7

Score: 54.7

Params: Unknown

Note: On non-multimodal tasks, we organize comparisons in two groups: the left panel compares Step 3.7 Flash with DeepSeek V4 Flash, an open-source model of comparable Flash-size scale, while the right panel places Step 3.7 Flash alongside frontier closed-source models. In particular, Step 3.7 Flash, Gemini 3.5 Flash, and DeepSeek V4 Flash are evaluated on Terminal-Bench 2.1, where DeepSeek V4 Flash is a self-tested score. GPT 5.5 and Claude Opus 4.7 use official self-reported Terminal-Bench 2.0 scores. On GDPval, Step 3.7 Flash score is obtained through internal pairwise evaluation, while comparison models are sourced from the official Artificial Analysis Leaderboard.

Gallery

01 / 08 Landing Page

02 / 08 Heritage Building

03 / 08 Menu Recognition

04 / 08 Travel Guide

05 / 08 Deep Search

06 / 08 Draft to Code

07 / 08 Video to Summary

08 / 08 Sketch to Web Page

Agentic Coding

Foundation models are shifting from answering questions to taking action, and in the digital world that action takes the form of code. Coding is the substrate of digital agency, the purest form of the...

Step 3.7 Flash

Related Articles

Amazon, Facebook, FBI have access to a private intelligence-sharing network

Show HN: GoPeek – open links in live mini browser windows without new tabs

Agent Memory: An Anatomy

SpaceX not the behemoth everyone thought

Naphtha Shortages Having a Growing Impact in Japan