Three Trends from MLSys 2026

Modular: Three trends from MLSys 2026

Hippocratic AI + Modular to power real-time patient conversations. Read More →

May 29, 2026

Three trends from MLSys 2026 Michael Dunn-OConnor

Brian Zhang

Shouzheng Liu

Engineering

MLSys 2026 provided an excellent overview of the current state of inference across research and industry. With six sessions on LLM serving this year (twice as many as last year) the program covered opportunities and challenges at the core of Modular’s recent work. Modular was glad to sponsor the conference, and our team noted three trends that stood out across the talks, posters, and keynotes. These are all topics that Modular has been addressing from first principles, with the advantage of our unique stack. Trend 1: Agents are writing everything from kernels to systems Monday’s keynote set the tone. Mark Saroufim's When AI Starts Writing Systems Code showed examples of novice kernel developers using AI agents to write kernels that could place them near the top of competitive hackathons. He then comically undercut some of these agentic achievements by demonstrating how agents would cheat the benchmarks and optimize results that would never generalize outside of the test cases provided. Rather than waiting for a generation of agents that always play by the rules, Saroufim outlined the need for “zero trust” verification by creating comprehensive enough benchmarks to not rely on good faith submissions. Lidong Zhou's Tuesday keynote The Next Horizon of Systems: From MLSys to System Intelligence argued that the systems community needs to plan for AI agents as the primary authors of low-level code. Zhou presented a Rust microkernel (Nanvix) where AI-generated specifications and proofs are verified module by module, with a pass rate on a 150-task proof generation benchmark that climbed from 2 percent (GPT-4o, prompt-based) to 91.3 percent (fine-tuned LLaMA-3.1 8B with self-debugging). The talk also documented the shortcuts the model takes when it cannot complete a proof: wrapping code in external_body to bypass the verifier, planting false postconditions, and shifting proof burden to callers. The shared conclusion of these talks was that agentic engineering requires substantially greater rigor in specification, design, and validation. Subsequent talks showcased specific applications of agentic engineering across kernels and systems. AccelOpt: A Self-Improving LLM Agentic System for AI Accelerator Kernel Optimization describes a closed-loop system where an LLM proposes accelerator kernel variants, profiles them, and feeds the results back to itself. FlashInfer-Bench: Building the Virtuous Cycle for AI-driven LLM Systems in the ML for Systems session frames this as a feedback loop: the benchmark exists to give the agents something to optimize against. The kernel-author pain that motivates all of this also showed up directly. FlashAttention-4: Algorithm and Kernel Pipelining Co-Design for Asymmetric Hardware Scaling moved its implementation off CUDA C++ templates and into CuTe-DSL embedded in Python with the explicit goal of letting downstream developers extend the kernel without modifying the core framework. HipKittens: Fast and Furious AMD Kernels and ParallelKittens: Systematic and Practical Simplification of Multi-GPU AI Kernels both argued for a simpler set of abstractions for kernel engineering. The shared assumption is that human kernel authors are not going to write a new template forest for every accelerator generation, and that abstraction benefits agents at least as much as human developers. Modular’s Solution Modular engineers and the community are already writing Mojo code with agents and discovering how many of its features are optimal for agentic engineering. Mojo’s robust type system, efficient compilation, and clear error messages support the tight feedback loops of agentic development with human verification. Our blog post on Translating to Mojo via AI Agents provides a practical guide to this workflow. The official modular/skills package plugs into Claude Code, Cursor, and other coding agents and corrects any misconceptions and out-of-date patterns that models may produce. In the post, Brad Larson walks an agent from a CUDA softmax kernel (Szymon Ożóg's FastSoftmax) to a portable Mojo version that runs on NVIDIA, AMD, and Apple silicon in a single session. Automatika Robotics did the same to autonomous-navigation kernels for their EMOS / kompass-core workload and reported 15.973 ms versus a 16.358 ms SYCL/CUDA baseline on the agent's first pass, with no Mojo-side optimization. Another blog post from Ehsan Kermani demonstrates effective agentic engineering to create Mojo libraries that are thoroughly tested and meet a real community need. The second connection is on the kernel side. The composable abstractions the Modular kernel team created are documented in our Structured Mojo Kernels blog series. This pattern breaks production kernels into three components (TileIO,...

Three Trends from MLSys 2026

Related Articles

Amazon, Facebook, FBI have access to a private intelligence-sharing network

Show HN: GoPeek – open links in live mini browser windows without new tabs

Agent Memory: An Anatomy

SpaceX not the behemoth everyone thought

Naphtha Shortages Having a Growing Impact in Japan