PyCon US 2026 Typing Summit Recap

PyCon US 2026 Typing Summit Recap · Bernát Gábor — Python packaging, tox, virtualenv & open source The PyCon US 2026 Typing Summit ran Thursday May 14, 2026, from 1 PM to 5 PM in Room 201A of the Long Beach Convention Center, the day before the main conference started. Eight talks plus a Typing Council Q&A, single track. This recap is for anyone who could not be in the room. TLDR: Guido van Rossum argued that PEP 484 ’s no-new-syntax rule is already broken in practice and that the field should weigh user pain over power features, citing the 2025 Python Typing Survey Jelle Zijlstra proposed adding intersection and restricted-negation types to the typing spec, with an inhabitation check as the load-bearing new rule. Michael Sullivan presented PEP 827 (Vercel) for type manipulation, modelled on TypeScript’s conditional and mapped types. Douglas Creager showed how ty represents generic-call constraints internally with ternary decision diagrams, and a third solver strategy that fixes a 9-line partial(choose, None) example every production checker today gets wrong. Conner Nilsen presented a Pyrefly experiment with AI coding agents: type checking moves success on well-typed Meta code from 79.6% to 83.9% with 21% fewer steps; no measurable help on lightly-typed SWE-bench Verified. Avik Chaudhuri demoed tensor-shape types in Pyrefly, blocked in practice by PEP 695 ’s eager evaluation of type parameters. Jia Chen presented a Lean 4 formalization (Featherweight Python) with mechanized soundness and decidability proofs; AI assistants turned what used to take years into weeks. The Typing Council panel (Carl Meyer, Jelle Zijlstra, Rebecca Chen on stage) opened the floor to attendee questions on governance, error-code consistency, metaprogramming, and the spec direction.

Experiments with AI agents and Pyrefly type errors — Conner Nilsen

Link to heading Conner (Meta, Pyrefly team) presented two questions: (1) does giving an AI coding agent a type checker help it finish tasks, and (2) does it prevent the agent from re-introducing old bugs while fixing new ones? His team ran two benchmarks with and without type-checker feedback and tracked three metrics: success rate, number of steps to completion, and wall-clock duration. The answer to question 1 depends on coverage. Well-typed code (an internal Meta benchmark): success rate moved from 79.6% to 83.9% , with 21% fewer steps and 14% faster wall-clock runs . The type checker caught problems before the agent went exploring. Lightly typed code (SWE-bench Verified over libraries like Django, SymPy, Matplotlib): no meaningful improvement. The agent spent steps on type errors in code adjacent to the task, fixing import mismatches and missing attributes unrelated to the assigned bug. The answer to question 2 was yes: with the type checker in the loop, the agent stopped re-introducing previously fixed bugs when working on new ones. Two findings on delivery mechanics: Models do not use tools just because you mention them. Telling the agent “you can run the type checker” was not enough. The team wrapped Pyrefly invocations in a lightweight think-act-observe loop that ran the type checker after every edit and injected the result. With that wrapper, both models engaged with the errors. Without it, they did not. Surface errors as a fresh conversation turn, not as edit-tool output. Errors returned inside the previous tool response got treated as noise. The same errors posted as a new turn got addressed. Model sensitivity diverged. Claude Sonnet 4.5 chased every error the type checker emitted, which helped on clean code and hurt on noisy code: the model would fix unrelated nags before returning to the task. GPT-5 codex stayed goal-oriented and ignored errors unless the wrapper forced them into the conversation. The open-questions slide flagged the follow-up: SWE-bench Verified is the wrong benchmark for this question, because the tasks themselves rarely require typing reasoning. A benchmark drawn from heavily typed projects would tell you whether type checking is load-bearing for the agent or helpful at the margin. The ty implementation of constraint sets — Douglas Creager

Link to heading Doug (Astral, billed on his title slide as “an OpenAI joint”) walked through how ty represents the state of a generic function call. The vehicle was a 9-line program that runs fine but every production type checker rejects, demonstrated live via multiplay , Astral’s tool for running a Python snippet through several type checkers side by side. Slides: dcreager/presentations def choose[A](a1: A, a2: A) -> A: return random.choice([a1, a2])

def partial[X, Y, Z](fn: Callable[[X, Y], Z], x: X) -> Callable[[Y], Z]: ...

p = partial(choose, None) p(2) # type checker: error. argument 2 is not None. p("hello") # ditto.

The direct call choose(None, 2) type-checks fine: A solves to None | Literal[2] (or None | int, depending on the checker). Routing it through...

PyCon US 2026 Typing Summit Recap

Related Articles

Elevated error rates on requests to multiple models

Donald Trump and sons to be 'forever' exempt from tax audits

PopuLoRA: Co-Evolving LLM Populations for Reasoning Self- Play

Old Reddit Is Down

The ultimate female fantasy – A feminist critique of Beauty and the Beast