Designing an MCP Server for Unstructured Data | Mark KiktaSkip to contentDesigning an MCP Server for Unstructured Data<br>Table of contentsIntroduction<br>Agents are most valuable when connected to your data. However, most data is unstructured: PDFs, Word documents, emails, etc. Unlike structured databases, unstructured documents cannot be queried directly. Agents often have to reread entire documents to recover context, wasting both time and tokens. And once your agent has read a document, that context is often local to your machine or even to that agent.<br>🧶 Ariadne processes unstructured documents into a searchable vector index and exposes them through a Model Context Protocol (MCP) server. This allows you to save tokens and share context between agents and teammates.<br>Let’s walk through the design of Ariadne.<br>Requirements<br>The most important functional requirements are<br>Users can upload any number of documents, and expect that all of their context has been faithfully captured and stored durably.<br>Users can connect any MCP Client to an MCP server to access their context.<br>The MCP server serves full documents retrieved by identifier, and chunks retrieved through semantic similarity search.<br>The non-functional requirements are:<br>The architecture should not preclude future horizontal scaling, although the current implementation targets a local single-machine deployment.<br>Document ingestion is expected to be asynchronous and may take several seconds. In contrast, the MCP server should return tool results reasonably quickly, say, in The target deployment is a single local machine that cannot host big LLMs.<br>Document collections are assumed to remain relatively small, which makes a local vector database practical while also improving semantic search quality by limiting the search space.<br>Data Model<br>There are only two pieces of information to retrieve: full documents and semantic chunks. These entities are modeled separately because they have different access patterns. Full documents are retrieved by identifier, while chunks are retrieved through semantic similarity search.<br>ChromaDB supports both access patterns and is a good fit for the initial implementation. It is lightweight, easy to deploy locally, persists to disk, and provides metadata filtering alongside vector similarity search. In ChromaDB, data is grouped into collections, and each item in ChromaDB has a document, embedding, and user-defined metadata. Ariadne uses those primitives to model its two entities as collections:<br>Document represents the full text content of a document. Its only metadata is the document name, and it does not need an embedding because it will only be searched for by id.
Chunk represents a chunk of text content from a document. Its only metadata are document_id, which refers to its parent record in Document, and chunk_idx, which is its index in the chunks of its parent document.
API<br>From the user’s perspective, interacting with Ariadne consists of three steps:<br>Create a pair of collections to hold complete documents and semantic chunks.<br>Upload documents for asynchronous processing.<br>Query the indexed data via the MCP server.<br>On the data ingestion side, there are two endpoints:<br>POST /create_collection<br>POST /process_document<br>create_collection configures a collection with a given name, and process_document begins the processing pipeline. The processing server extracts text, performs any configured LLM enrichments, chunks the document, and stores both the full document and its chunks in ChromaDB.<br>Once documents have been indexed, they become available through two MCP tools:<br>get_full_document(name | id)<br>search(query, document_id?)<br>get_full_document retrieves the complete source document from the Document collection by id or name, while search performs semantic similarity search for query over the Chunk collection, optionally filtered by document_id. New documents can continue to be uploaded while the database is queried for previously indexed data.<br>High-Level Design<br>At a high-level, the system is composed of a few key components, plus the embedding and enrichment models they call out to:<br>A processing server that asynchronously processes uploaded documents and inserts the documents and chunks into the database. The processing pipeline can be configured to call an LLM to enrich the documents by annotating pictures, adding code or figure understanding, and/or performing entity extraction.<br>The core vector database, which handles embeddings by calling out to an embedding model.<br>An MCP server which accepts requests from any MCP client, searches the vector database via its two tools, and returns the results to the client.<br>The following diagram provides an overview of the system:<br>In the processing server, documents are converted via preconfigured Docling pipelines, which can have LLM enrichments configured to run locally or via an API, though this is not yet implemented. Then the converted documents are serialized before being added to the...