PyCon US 2026 Typing Summit Recap

rexledesma1 pts0 comments

PyCon US 2026 Typing Summit Recap · Bernát Gábor — Python packaging, tox, virtualenv & open source<br>The PyCon US 2026 Typing Summit<br>ran Thursday May 14, 2026, from 1 PM<br>to 5 PM in Room 201A of the Long Beach Convention Center, the day before the main conference started. Eight talks<br>plus a Typing Council Q&A, single track. This recap is for anyone who could not be in the room.<br>TLDR:<br>Guido van Rossum argued that PEP 484<br>&rsquo;s no-new-syntax rule is already broken<br>in practice and that the field should weigh user pain over power features, citing the<br>2025 Python Typing Survey<br>Jelle Zijlstra proposed adding intersection and restricted-negation types to the typing spec, with an<br>inhabitation check as the load-bearing new rule.<br>Michael Sullivan presented PEP 827<br>(Vercel) for type manipulation, modelled<br>on TypeScript&rsquo;s conditional and mapped types.<br>Douglas Creager showed how ty represents generic-call constraints internally with ternary decision diagrams,<br>and a third solver strategy that fixes a 9-line partial(choose, None) example every production checker today<br>gets wrong.<br>Conner Nilsen presented a Pyrefly experiment with AI coding agents: type checking moves success on well-typed<br>Meta code from 79.6% to 83.9% with 21% fewer steps; no measurable help on lightly-typed SWE-bench Verified.<br>Avik Chaudhuri demoed tensor-shape types in Pyrefly, blocked in practice by<br>PEP 695<br>&rsquo;s eager evaluation of type parameters.<br>Jia Chen presented a Lean 4 formalization (Featherweight Python) with mechanized soundness and decidability<br>proofs; AI assistants turned what used to take years into weeks.<br>The Typing Council panel (Carl Meyer, Jelle Zijlstra, Rebecca Chen on stage) opened the floor to attendee<br>questions on governance, error-code consistency, metaprogramming, and the spec direction.

Experiments with AI agents and Pyrefly type errors — Conner Nilsen

Link to heading<br>Conner (Meta, Pyrefly<br>team) presented two questions: (1) does giving an AI coding agent a<br>type checker help it finish tasks, and (2) does it prevent the agent from re-introducing old bugs while fixing new<br>ones? His team ran two benchmarks with and without type-checker feedback and tracked three metrics: success rate,<br>number of steps to completion, and wall-clock duration.<br>The answer to question 1 depends on coverage.<br>Well-typed code (an internal Meta benchmark): success rate moved from 79.6% to 83.9% , with 21% fewer<br>steps and 14% faster wall-clock runs . The type checker caught problems before the agent went exploring.<br>Lightly typed code (SWE-bench Verified<br>over libraries like Django, SymPy,<br>Matplotlib): no meaningful improvement. The agent spent steps on type errors in code adjacent to the task, fixing<br>import mismatches and missing attributes unrelated to the assigned bug.<br>The answer to question 2 was yes: with the type checker in the loop, the agent stopped re-introducing previously<br>fixed bugs when working on new ones.<br>Two findings on delivery mechanics:<br>Models do not use tools just because you mention them. Telling the agent &ldquo;you can run the type checker&rdquo; was not<br>enough. The team wrapped Pyrefly invocations in a lightweight think-act-observe loop that ran the type checker after<br>every edit and injected the result. With that wrapper, both models engaged with the errors. Without it, they did<br>not.<br>Surface errors as a fresh conversation turn, not as edit-tool output. Errors returned inside the previous tool<br>response got treated as noise. The same errors posted as a new turn got addressed.<br>Model sensitivity diverged. Claude Sonnet 4.5<br>chased every error<br>the type checker emitted, which helped on clean code and hurt on noisy code: the model would fix unrelated nags before<br>returning to the task. GPT-5 codex stayed goal-oriented and ignored errors unless the wrapper forced them into the<br>conversation.<br>The open-questions slide flagged the follow-up: SWE-bench Verified is the wrong benchmark for this question, because<br>the tasks themselves rarely require typing reasoning. A benchmark drawn from heavily typed projects would tell you<br>whether type checking is load-bearing for the agent or helpful at the margin.<br>The ty implementation of constraint sets — Douglas Creager

Link to heading<br>Doug (Astral, billed on his title slide as &ldquo;an OpenAI joint&rdquo;) walked through how ty<br>represents the state of a generic function call. The vehicle was a 9-line program that runs fine but every production<br>type checker rejects, demonstrated live via<br>multiplay<br>, Astral&rsquo;s tool for running a Python snippet through several type<br>checkers side by side. Slides:<br>dcreager/presentations<br>def choose[A](a1: A, a2: A) -> A:<br>return random.choice([a1, a2])

def partial[X, Y, Z](fn: Callable[[X, Y], Z], x: X) -> Callable[[Y], Z]: ...

p = partial(choose, None)<br>p(2) # type checker: error. argument 2 is not None.<br>p("hello") # ditto.

The direct call choose(None, 2) type-checks fine: A solves to None | Literal[2] (or None | int, depending on<br>the checker). Routing it through...

type checker typing code errors agent

Related Articles