Bayer's PRINCE: a production agentic RAG system

logickkk11 pts0 comments

Building Reliable Agentic AI Systems

Building Reliable Agentic AI Systems

A Case Study in building production-ready agentic AI systems

This paper presents the Preclinical Information Center (PRINCE), a cloud-hosted platform<br>developed by Bayer AG with Thoughtworks to address pharmaceutical industry challenges in drug<br>development. PRINCE leverages Agentic Retrieval-Augmented Generation<br>and Text-to-SQL to integrate decades of safety study reports. We describe PRINCE's evolution<br>from keyword-based search to an intelligent research assistant capable of answering complex<br>questions and drafting regulatory documents. We reflect on key engineering decisions through<br>the lens of context engineering—how information was shaped and routed between specialized<br>agents—and harness engineering—how orchestration, recovery, and observability were built<br>around the models to maintain control and reliability. The system prioritizes trust through<br>transparency, explainability, and human-in-the-loop integration. PRINCE demonstrates AI's<br>transformative potential in pharmaceuticals, significantly improving data accessibility and<br>research efficiency while ensuring governance and compliance.

16 June 2026

Sarang Sanjay Kulkarni

Sarang Kulkarni is a Principal Consultant at Thoughtworks, working at the intersection of<br>software engineering, data platforms, and applied AI. He focuses on building<br>production-grade GenAI systems, particularly Retrieval-Augmented Generation (RAG) and<br>multi-agent workflows, and helps teams take these systems from early ideas to real-world<br>use. Sarang also contributes to Thoughtworks’ Global AI Service Development team and teaches<br>an O’Reilly<br>course on building production-ready RAG applications.

Contents

The Challenge: Navigating the Preclinical Data Maze

The Solution: PRINCE - An Evolutionary Platform

System Architecture: Engineering a Reliable Agentic RAG System

The Agentic RAG System

Clarify User Intent

Think & Plan: Process Reflection

The Researcher Agent

The Reflection Agent: Data Validation and Sufficiency

The Writer Agent: Answer Synthesis and Formatting

Building Trust in a Production LLM System

Transparency and Explainability

Evaluation

Monitoring

Engineering for Resilience: Error Handling and Recovery

Enhancing Data Quality: Named Entity Recognition and Annotation

The Journey Continues: Iterative Development

Conclusion

Preclinical drug discovery is inherently complex and data-intensive.<br>Researchers face the significant challenge of efficiently accessing and<br>analyzing vast volumes of information generated during this critical phase.<br>Traditional keyword-based search methods, often reliant on rigid Boolean<br>logic, frequently fall short when confronted with the nuanced and intricate<br>nature of preclinical research questions.

The advent of Large Language Models (LLMs) has presented a transformative opportunity. By<br>combining the generative power of LLMs with the precision of information retrieval systems, Retrieval-Augmented Generation (RAG) has emerged as a promising technique.<br>This approach holds the potential to revolutionize preclinical data access, enabling<br>researchers to pose complex questions in natural language and receive accurate, context-rich<br>answers grounded in proprietary data.

Recognizing this potential early, Bayer committed to exploring how these<br>technologies could address longstanding challenges in preclinical research.

In this post, we share that journey—how Bayer's early investment in generative AI<br>has resulted in PRINCE, an agentic AI system built on Agentic RAG. This case study<br>explores the technical architecture, engineering decisions, and lessons<br>learned in transforming preclinical data retrieval from a challenging maze<br>into an intuitive conversational experience.

Many of the engineering decisions behind PRINCE can now be understood through the lens of context<br>engineering and harness engineering, although when the system was first designed we did not use these terms. Context engineering shaped what information each model<br>received, what it did not receive, and how context moved between specialized steps such as<br>research, reflection, and writing. Harness engineering shaped the scaffolding around the<br>models: orchestration, tool boundaries, state persistence, retries, fallbacks, validation,<br>reflection loops, observability, and human review.

While this post focuses on the technical architecture and engineering challenges, our paper<br>published in Frontiers in Artificial Intelligence covers the<br>product evolution and business impact in more detail.

The Challenge: Navigating the Preclinical Data Maze

The preclinical research landscape at Bayer, like many large<br>pharmaceutical organizations, is characterized by a diverse and extensive<br>array of data. This includes highly structured datasets from various studies, alongside vast<br>amounts of unstructured<br>information embedded within text documents such as study reports,<br>publications, and regulatory submissions. Researchers...

engineering data agentic preclinical prince system

Related Articles