The Collaborative Exoskeleton of AI Science

Asimov’s Addendum

SubscribeSign in

Tim O'Reilly May 15, 2026

There is a lot of hope that AI will advance the progress of science, but unfortunately, the collision between AI and scientific publishing has not gone well. When an AI coding agent writes code, it operates within a rich ecosystem of version control, pull requests, code review, CI/CD pipelines, dependency management, and package registries. Github wasn’t designed for AI, but it turned out to be foundational infrastructure that makes AI-assisted software development work. Science has an equivalent set of infrastructure for handling identity, provenance, integrity, and discoverability. Systems like arXiv, DOIs, CrossRef, Datacite, ORCID, OpenAlex, ROR, Retraction Watch, and PubMed form a kind of collaborative exoskeleton for scientific publishing and by extension, for modern scientific knowledge. Much as Github has been adapted for AI development, this infrastructure needs to be adapted for AI use in science. The problems fall into several categories: Hallucinated citations. When AI generates or assists with scientific papers, it routinely fabricates references. A multi-model study found that only about a quarter of AI-generated citations were entirely correct. Roughly 40% were erroneous or fabricated. Hallucinated citations have been found in papers accepted at NeurIPS and ICLR, the top AI conferences. GPTZero’s investigation found that about 2% of papers accepted at NeurIPS 2025 contained at least one fabricated reference. The peer reviewers missed them all. AI researchers, who understand hallucinations better than anyone, fell victim because convenience trumped verification. Retracted paper propagation. AI tools are citing retracted papers without flagging them. Retraction Watch co-founder Ivan Oransky has noted that building a comprehensive retraction database is resource-intensive. Yet AI tools that claim to support scientific research are not even integrating the databases that already exist. A study of 21 chatbots found that on average, they correctly identified fewer than half of retracted papers when asked, and they produced substantial false positives as well. MIT Technology Review reported that AI chatbots are relying on material from retracted papers to answer questions, with some tools returning retracted articles with no retraction notice at all. Training on compromised literature. AI models trained on scientific corpora inevitably absorb retracted, fraudulent, and paper-mill-generated content. Between 2024 and 2025, the retraction crisis accelerated dramatically. A recent bibliometric analysis found that AI-driven retractions have shifted from sporadic anomalies to a systemic crisis, with generative tools enabling paper mills to penetrate the highest levels of scholarly indexing. AI doesn’t know the difference between a landmark paper and a paper-mill product. Without integration with retraction databases and quality signals, this pollution propagates. Generation of “AI slop” papers. “Paper mills were already a problem, but AI has made the problem far worse. In a world of “publish or perish,” scholars have strong incentives to generate poor quality papers, cite their own work excessively, and otherwise introduce noise into the system. As the MIT VRAIX project puts it, because large language models are nondeterministic, “the same prompt can produce different answers, each delivered with fluency and confidence. These systems routinely present statements without verifiable sources, cite fabricated or incorrect references, blur the line between summarization and invention, and favor what’s statistically popular over what’s trustworthy. Even when real citations are included, users often have no easy way to determine whether those references are relevant, reliable, or even supportive of the claim being made.” Tools to address these problems largely already exist, but they haven’t been integrated into AI systems. New tools are also being developed. As the AI Labs turn their attention to AI for science, they should also be exploring what the future infrastructure of scientific knowledge sharing might look like. That is the subject of this article. The infrastructure of collaboration

DOIs and CrossRef. Every legitimate scholarly work has (or should have) a DOI, a persistent digital identifier maintained by CrossRef. CrossRef’s REST API lets you resolve a DOI and verify that a paper actually exists, with the correct title, authors, journal, and year. This is the most basic hallucination check imaginable, and yet most AI systems don’t perform it. Why isn’t this kind of validation built into every AI system that touches scientific literature? DOIs are not a panacea. They have been hacked both for fun and profit. As Geoffrey Bilder, the former director of technology for Crossref noted, there are DOIs that point to a South Park movie, a fake...

The Collaborative Exoskeleton of AI Science

Related Articles

Amazon, Facebook, FBI have access to a private intelligence-sharing network

SpaceX not the behemoth everyone thought

The Mirror Is Part of the Machine

Elevated error rates on requests to multiple models

Donald Trump and sons to be 'forever' exempt from tax audits