Why Data Marketplaces Don't Work: The Geometry of Data Markets

dataranger1 pts0 comments

The Geometry of Data Markets | Brickroad

The Liquidity Thesis<br>About a month ago, Sean Cai published a great article on X positing that 2026 will be the year data becomes liquid. I agree, so much so that last year I, along with my co-founders, quit our jobs, sold a company, and chose to forego more lucrative near-term opportunities to build Brickroad to bring about that reality. But having been in the trenches of this "information services" space in what is proving to be the most transformative time in this sector's history, I can tell you that the path to liquidity is not the one most people imagine.<br>Conventional venture wisdom has been something like this: AI is hungry for data, new data suppliers are emerging everywhere, and eventually supply will meet demand in some kind of functioning market. More data, more buyers, more deals, and liquidity follows naturally.<br>I've watched this thesis play out repeatedly across venture and beyond, and it's wrong. The achievement of "data liquidity" will not be a product of supply. It will be a product of better infrastructure, infrastructure that until recently, until Brickroad, did not exist.<br>The information services industry, which is comprised of data aggregators, brokers, and data-as-a-service providers, is a $200B+ market that still operates the way it did twenty years ago. When a corporation, hedge fund, or AI lab wants to acquire data, they enter into bilateral negotiations that span search, integration, contracting, licensing, cost negotiation, and quality evaluation. Each deal forces n × m wiring between sources and endpoints. Engineers build bespoke ingestion pipelines. Researchers and quants validate data quality and backtest. Legal teams review agreements. A typical enterprise data procurement cycle can take three to six months from initial discovery to production integration, with legal review alone consuming four to eight weeks (that's assuming there is even a market for the data being sold!).<br>Every integration is bespoke; every integration carries its own coordination overhead.<br>That coordination overhead is a pervasive friction, one which has choked market efficiencies in the information services industry and, by extension, innovation as a whole. And the intuitive response to this problem has always been the same: build a marketplace. If bilateral negotiation is the bottleneck, aggregate supply into a catalog, let buyers browse, and let the platform handle the transaction. It is the geometry that worked for software (app stores), for cloud compute (AWS), for labor (Upwork), and for virtually every other digital good that has achieved liquidity in the last two decades. It is also, in the case of data, wrong.<br>Most recently, Databricks, Snowflake, and AWS have each built "data marketplaces" intended to reduce this overhead. All three have failed to achieve meaningful adoption, landing instead on products that trend more towards product directories than programmatic data procurement networks. The reason is structural: a marketplace assumes that data can be listed, browsed, and transacted like software on an app store. But data procurement is not a catalog problem. It is a coordination problem. Every dataset requires schema negotiation, format alignment, licensing review, and quality validation before a buyer even knows whether the data is useful. A marketplace can list the inventory, but it cannot perform the work of matching, evaluating, and integrating that inventory to a buyer's specific needs. Data simply does not sell itself.<br>To understand why marketplaces are not, and will never be, the substrate for data liquidity, I think it is important to understand the geometry of data markets: the shape of how information has always flowed between those who have it and those who need it.<br>This post traces that evolution, from before the telegraph to the transformer, revealing how the geometry shifted in each cycle, where opportunities surfaced, and why incumbents lost. The pattern that emerges presents the current moment as a unique and underappreciated opportunity for those willing to build the infrastructure that makes data liquid.<br>The Landscape<br>Before tracing the history, it helps to map the current geometry: who sits on each side of the data market today, and why the shape of their interactions matters.

On the supply side, there are five distinct players. The first, and most valuable, are first-party data providers: companies that generate proprietary data as a byproduct of their core operations but do not think of themselves as data companies. A logistics firm sitting on years of shipping route optimization data. A healthcare network with decades of patient outcome records. These are the sources with the most alpha, precisely because their data has never been leveraged, for trading, research, training or otherwise. For many paying users of data, the value of a dataset is in its private state; and the moment it is listed on a public marketplace, that value...

data geometry liquidity market build information

Related Articles