[AINews] All Model Labs are now Agent Labs - Latent.Space
SubscribeSign in
AINews: Weekday Roundups<br>[AINews] All Model Labs are now Agent Labs<br>a quiet day lets us tie together a few quotes as all model labs become agent labs<br>May 23, 2026<br>∙ Paid
16
Share
Ahead of OpenAI’s likely IPO filing next week, Greg makes the latest in a series of comments where Model Labs are increasingly also building Agents as the product:
The quote is a big reversal of stance from a position ~uniformly held by anyone who worked at Team Big Model , including his previous head of OpenAI Labs:
This comes with the shuttering of AI21’s model team, which is now pivoting to agents:
and even the venerable DeepSeek is now building a “Harness team” for the first time:
The “Systems over Models” people will take this as a point of validation of what they have been saying all along… except for the nuance that models cotrained with harnesses does open the door for closing access to models even further — if you can effectively posttrain a model to only meaningfully perform with your closed source agent, then you get to funnel the majority of users to your agent at the expense of your model/API co-opetition.<br>But that’s a topic of a much larger discussion…<br>AI News for 5/4/2026-5/5/2026. We checked 12 subreddits, 544 Twitters and no further Discords. AINews’ website lets you search all past issues. As a reminder, AINews is now a section of Latent Space. You can opt in/out of email frequencies!
AI Twitter Recap
Agent Products, Harnesses, and the Shift Beyond “Just the Model”<br>The product surface is moving up-stack : A recurring theme was that model quality alone is no longer the moat; the winning product is increasingly model + harness + workflow + UI + memory + economics . @gdb put it bluntly: “the model alone is no longer the product,” while @dzhng argued top-tier products need model <> harness <> product symbiosis . The same pattern shows up in practice: @signulll framed ambient AI and agentic AI as the new seam of computing interfaces, and @teortaxesTex noted that harness research still risks converging on “replicate Claude Code” instead of exploring broader interfaces.
Coding-agent product differentiation is becoming concrete : OpenAI shipped another substantial Codex update via “codex thursday no. 6” with appshots, /goal improvements, remote computer use while locked, annotation mode, plugin sharing, and analytics . @gdb separately highlighted Appshots , while users reported meaningful workflow shifts: @gdb said it’s hard to remember coding before Codex, and @reach_vb said they haven’t opened an IDE in over a month. But product rough edges remain: @theo praised T3 Code’s remote feature as ahead of alternatives, then contrasted it with buggy remote workflows in Codex in a follow-up post. On the Claude side, @ClaudeDevs expanded auto mode to the Pro plan and added Sonnet 4.6 support; @_mohansolo also had to clarify and patch IDE support in Antigravity 2.0 after user backlash.
Model Performance, Cost Curves, and Frontier Competition<br>DeepSeek’s pricing move was the biggest market signal : @deepseek_ai made the 75% DeepSeek-V4-Pro discount permanent , triggering strong reactions because it materially changes the cost/performance frontier . @ArtificialAnlys quantified first-party pricing at $0.435/M input, $0.87/M output, $0.0036/M cached input , estimating a blended ~$0.18/M and placing V4 Pro on the Pareto frontier for intelligence vs run cost. They estimate running their Intelligence Index on V4 Pro costs ~3x less than Gemini 3.1 Pro Preview, ~12x less than GPT-5.5, and ~19x less than Claude Opus 4.7 . Community reaction centered on DeepSeek’s push toward “intelligence too cheap to meter ,” as @scaling01 put it. @Yuchenj_UW and @kimmonismus both emphasized the magnitude of the cut.
Gemini Flash improved, but usage feedback was mixed : @OfficialLoganK reported Gemini 3.5 Flash making major progress over 3.1 Pro on GDPval , claiming Flash is now “competing at the frontier,” and @Designarena placed it 16th overall on Design Arena, a 16-position jump from Gemini 3 Flash Preview. But several builders pushed back on usefulness vs benchmark gains: @Alezander907 saw only slight browser-agent improvement at higher cost, @giffmana argued this isn’t “Flash progress” if the brand still implies cheapness, and @jeremyphoward said the model feels optimized to max evals rather than cooperate with humans . That aligns with broader eval skepticism from @HamelHusain, who argued current tooling underweights qualitative, HITL judgment.
Qwen and Chinese frontier models keep compressing the race : The official @Alibaba_Qwen teasers and a long third-party review from @ZhihuFrontier portrayed Qwen3.7-Max as a meaningful step up, especially in instruction following, context reliability, and stability , while still suffering from verbosity and high token usage . Elsewhere, @scaling01 claimed recent ALE-Bench runs show Chinese models like Kimi-K2.6,...