Emerging Patterns in Building GenAI Products

As we move software products using generative AI technology from proof-of-concepts into production systems, we are uncovering a range of common patterns. Evals play a central role in ensuring that these non-deterministic systems are operating within sensible boundaries. Large Language Models need enhancement to provide information beyond a generic and static training set. Most of the time we can do this with Retrieval Augmented Generation (RAG), although the basic RAG approach requires several patterns to overcome its limitations. When RAG isn't enough, Fine Tuning becomes worthwhile.

25 February 2025

Bharani Subramaniam

Bharani is CTO Thoughtworks India and Middle East with a focus on business platforms and data engineering. He is a member of Thoughtworks Technology Advisory Board and contributes to the creation of Thoughtworks Technology Radar.

Martin Fowler

I've been educating professional software developers for three decades, and during that time I've seen many “game-changing developments”, most of which fizzle. I'm inclined to think that while there's a stunning amount of hype with AI, some of it will have a genuine impact. My part in this is to help my colleagues communicate what they have learned from day-to-day work with clients all over the world

application architecture

generative AI

Contents

Direct Prompting ✣

Evals ✣

Scoring and Judging

Example

Running the Evals

Evals and Benchmarking

Embeddings ✣

Example Image Embedding

Embeddings in LLM

Retrieval Augmented Generation (RAG) ✣

RAG Template

RAG in Practice

Hybrid Retriever ✣

Query Rewriting ✣

Reranker ✣

Guardrails ✣

Guardrails using LLMs

Embeddings based guardrails

Rule based guardrails

Putting together a Realistic RAG

Fine Tuning ✣

Further Work

Sidebars

LLM benchmarks, evals and tests

The transition of Generative AI powered products from proof-of-concept to production has proven to be a significant challenge for software engineers everywhere. We believe that a lot of these difficulties come from folks thinking that these products are merely extensions to traditional transactional or analytical systems. In our engagements with this technology we've found that they introduce a whole new range of problems, including hallucination, unbounded data access and non-determinism.

We've observed our teams follow some regular patterns to deal with these problems. This article is our effort to capture these. This is early days for these systems, we are learning new things with every phase of the moon, and new tools flood our radar. As with any pattern, none of these are gold standards that should be used in all circumstances. The notes on when to use it are often more important than the description of how it works.

In this article we describe the patterns briefly, interspersed with narrative text to better explain context and interconnections. We've identified the pattern sections with the “✣” dingbat. Any section that describes a pattern has the title surrounded by a single ✣. The pattern description ends with “✣ ✣ ✣”

These patterns are our attempt to understand what we have seen in our engagements. There's a lot of research and tutorial writing on these systems out there, and some decent books are beginning to appear to act as general education on these systems and how to use them. This article is not an attempt to be such a general education, rather it's trying to organize the experience that our colleagues have had using these systems in the field. As such there will be gaps where we haven't tried some things, or we've tried them, but not enough to discern any useful pattern. As we work further we intend to revise and expand this material, as we extend this article we'll send updates to our usual feeds.

Patterns in this Article

Direct PromptingSend prompts directly from the user to a Foundation LLM

EmbeddingsTransform large data blocks into numeric vectors so that embeddings near each other represent related concepts

EvalsEvaluate the responses of an LLM in the context of a specific task

Fine TuningCarry out additional training to a pre-trained LLM to enhance its knowledge base for a particular context

GuardrailsUse separate LLM calls to avoid dangerous input to the LLM or to sanitize its results

Hybrid RetrieverCombine searches using embeddings with other search techniques

Query RewritingUse an LLM to create several alternative formulations of a query and search with all the alternatives

RerankerRank a set of retrieved document fragments according to their usefulness and send the best of them to the LLM.

Retrieval Augmented Generation (RAG)Retrieve relevant document fragments and include these when prompting the LLM

Direct Prompting

Send prompts directly from the user to a Foundation LLM

The most basic approach to using an LLM is to connect an off-the-shelf LLM directly to a user, allowing the user to type...

Emerging Patterns in Building GenAI Products

Related Articles

Amazon, Facebook, FBI have access to a private intelligence-sharing network

SpaceX not the behemoth everyone thought

Elevated error rates on requests to multiple models

Donald Trump and sons to be 'forever' exempt from tax audits

PopuLoRA: Co-Evolving LLM Populations for Reasoning Self- Play