Bayer's PRINCE: a production agentic RAG system

Building Reliable Agentic AI Systems

A Case Study in building production-ready agentic AI systems

This paper presents the Preclinical Information Center (PRINCE), a cloud-hosted platform developed by Bayer AG with Thoughtworks to address pharmaceutical industry challenges in drug development. PRINCE leverages Agentic Retrieval-Augmented Generation and Text-to-SQL to integrate decades of safety study reports. We describe PRINCE's evolution from keyword-based search to an intelligent research assistant capable of answering complex questions and drafting regulatory documents. We reflect on key engineering decisions through the lens of context engineering—how information was shaped and routed between specialized agents—and harness engineering—how orchestration, recovery, and observability were built around the models to maintain control and reliability. The system prioritizes trust through transparency, explainability, and human-in-the-loop integration. PRINCE demonstrates AI's transformative potential in pharmaceuticals, significantly improving data accessibility and research efficiency while ensuring governance and compliance.

16 June 2026

Sarang Sanjay Kulkarni

Sarang Kulkarni is a Principal Consultant at Thoughtworks, working at the intersection of software engineering, data platforms, and applied AI. He focuses on building production-grade GenAI systems, particularly Retrieval-Augmented Generation (RAG) and multi-agent workflows, and helps teams take these systems from early ideas to real-world use. Sarang also contributes to Thoughtworks’ Global AI Service Development team and teaches an O’Reilly course on building production-ready RAG applications.

Contents

The Challenge: Navigating the Preclinical Data Maze

The Solution: PRINCE - An Evolutionary Platform

System Architecture: Engineering a Reliable Agentic RAG System

The Agentic RAG System

Clarify User Intent

Think & Plan: Process Reflection

The Researcher Agent

The Reflection Agent: Data Validation and Sufficiency

The Writer Agent: Answer Synthesis and Formatting

Building Trust in a Production LLM System

Transparency and Explainability

Evaluation

Monitoring

Engineering for Resilience: Error Handling and Recovery

Enhancing Data Quality: Named Entity Recognition and Annotation

The Journey Continues: Iterative Development

Conclusion

Preclinical drug discovery is inherently complex and data-intensive. Researchers face the significant challenge of efficiently accessing and analyzing vast volumes of information generated during this critical phase. Traditional keyword-based search methods, often reliant on rigid Boolean logic, frequently fall short when confronted with the nuanced and intricate nature of preclinical research questions.

The advent of Large Language Models (LLMs) has presented a transformative opportunity. By combining the generative power of LLMs with the precision of information retrieval systems, Retrieval-Augmented Generation (RAG) has emerged as a promising technique. This approach holds the potential to revolutionize preclinical data access, enabling researchers to pose complex questions in natural language and receive accurate, context-rich answers grounded in proprietary data.

Recognizing this potential early, Bayer committed to exploring how these technologies could address longstanding challenges in preclinical research.

In this post, we share that journey—how Bayer's early investment in generative AI has resulted in PRINCE, an agentic AI system built on Agentic RAG. This case study explores the technical architecture, engineering decisions, and lessons learned in transforming preclinical data retrieval from a challenging maze into an intuitive conversational experience.

Many of the engineering decisions behind PRINCE can now be understood through the lens of context engineering and harness engineering, although when the system was first designed we did not use these terms. Context engineering shaped what information each model received, what it did not receive, and how context moved between specialized steps such as research, reflection, and writing. Harness engineering shaped the scaffolding around the models: orchestration, tool boundaries, state persistence, retries, fallbacks, validation, reflection loops, observability, and human review.

While this post focuses on the technical architecture and engineering challenges, our paper published in Frontiers in Artificial Intelligence covers the product evolution and business impact in more detail.

The Challenge: Navigating the Preclinical Data Maze

The preclinical research landscape at Bayer, like many large pharmaceutical organizations, is characterized by a diverse and extensive array of data. This includes highly structured datasets from various studies, alongside vast amounts of unstructured information embedded within text documents such as study reports, publications, and regulatory submissions. Researchers...

Bayer's PRINCE: a production agentic RAG system

Related Articles

The Newest Instagram "Exploit" Is the Goofiest I've Seen

Apple WWDC 2026 Livestream

Claude Fable 5

US Government directive to suspend access to Fable 5 and Mythos 5

German ruling declares Google liable for false answers in AI Overviews