Why frontier LLMs can't read the hard documents without experts involved

The 76% Wall: Where the Model Labs Stop Eating IDP

Skip to main content IDP-Software

On This Page The month the price went to cents The labs are not selling extraction. They are selling the desk. The plumbing went open in the same four weeks The wall Where the value actually went What this means if you are buying Caveats Christopher Helm Christopher Helm covers the intelligent document processing market, tracking vendor developments, technology trends, and enterprise adoption patterns. His analysis focuses on how AI-powered automation reshapes document workflows across industries including financial services, healthcare, insurance, and logistics.

Subscribe to updates Subscribe The month the price went to cents In June 2026 the cost of reading a document with a machine stopped being a line item worth managing. Independent cost analysis puts direct Gemini Flash extraction at roughly $0.17 per 1,000 pages, against $1.50 for AWS Textract's basic OCR and $30 for Google's own legacy Document AI Form Parser. A developer with a stack of standard invoices no longer evaluates an intelligent document processing vendor. They paste the file into a model they already pay for and read the JSON that comes back. The incumbent cloud platforms have conceded the point in their own release notes. Google's Document AI now routes its Layout Parser through Gemini 3 Flash and Gemini 3 Pro, with image and table annotations reaching general availability on May 27, and every legacy pre-2022 processor for identity, tax, mortgage and procurement documents deprecated effective June 30, 2026. The product that defined cloud OCR for a decade is now a wrapper around the same general model a developer can call directly. When the platform that invented the category retires its own purpose-built processors in favor of a chat model, the purpose-built layer is over. The independent benchmarks agree. The Nanonets IDP Leaderboard, run across more than 9,000 real documents, has Gemini 3 Flash leading key information extraction at 91.1% and Gemini 3.1 Pro on top for OCR, tables and visual question answering. For the easy half of the market (clean invoices, printed forms, structured PDFs), the model is the pipeline. The loop the agentic hotfolder post described in May is now priced in cents and shipped by the people who make the models. The labs are not selling extraction. They are selling the desk. The repricing of OCR is the small story. The large one is that the frontier labs have stopped pitching chat and started pitching the knowledge-work desk: the same desk where document-heavy professional work lives. OpenAI shipped Workspace Agents on April 22, the successor to Custom GPTs, with scheduled agents that connect to Google Drive, SharePoint, Box and Salesforce and improve through memory. The launch cited OpenAI's own internal use: 24,771 K-1 tax forms processed, weekly business reports automated. GPT-5.5, released two days later, is positioned in language that would have been a category claim for an IDP vendor a year ago: "creating documents and spreadsheets, operating software, and moving across tools until a task is finished." Then on May 27, OpenAI and Thrive put a self-improving tax AI into production with a network of thirty-plus accounting firms, reporting 97% accuracy on K-1 forms across a 7,000-return pilot. That is not a demo. That is document-heavy regulated professional work, automated, with a number attached. Anthropic is making the same move from the other side. Claude Cowork, launched February 24, is sold as an "autonomous digital colleague" that reads and extracts key terms from DocuSign agreements, pulls structured data out of receipt images into Excel with formulas, and ships with ten finance agent templates for KYC file screening and pitchbook building. Anthropic's financial-services positioning claims an 83% accuracy on complex Excel tasks and a lead on the Vals AI Finance Agent benchmark. Sam Altman's framing for where this goes is explicit: agents trusted to handle "multi-day and multi-week tasks, operating proactively much like a senior human employee." This is the threat the IDP middle has not priced. The model labs are not trying to be a better extractor. They are trying to own the agent that reads the document, reasons over it, and does the downstream work, the agentic workflow layer that vertical platforms like Coupa and UiPath also want. Extraction is a feature of that agent, not a product alongside it. The plumbing went open in the same four weeks While the labs took the bottom, the rest of the stack standardized the middle. The document format itself, the proprietary intermediate representation that several vendors treated as a moat, became an open standard in June. On June 23, ABBYY shipped FineReader Engine 12.8 with export support for DocLang, and the Linux AI & Data Foundation announced DocLang as a vendor-neutral AI-native document standard co-founded by ABBYY, IBM, NVIDIA, Red Hat and HumanSignal. When five...

Why frontier LLMs can't read the hard documents without experts involved

Related Articles

Is AI ruining our skills? Early results are in – and they're not good

The Anatomy of an AI-Native Org

Apertus – Open Foundation Model for Sovereign AI

How to Earn a Billion Dollars

Italy's Meloni says Trump 'made up' story that she 'begged' him for photo at G7