Emerging Patterns in Building GenAI Products
Emerging Patterns in Building GenAI Products
As we move software products using generative AI technology from<br>proof-of-concepts into production systems, we are uncovering a range of common<br>patterns. Evals play a central role in ensuring that these non-deterministic<br>systems are operating within sensible boundaries. Large Language Models need<br>enhancement to provide information beyond a generic and static training set.<br>Most of the time we can do this with Retrieval Augmented Generation (RAG),<br>although the basic RAG approach requires several patterns to overcome its<br>limitations. When RAG isn't enough, Fine Tuning becomes worthwhile.
25 February 2025
Bharani Subramaniam
Bharani is CTO Thoughtworks India and Middle East with a focus on business platforms<br>and data engineering. He is a member of Thoughtworks Technology Advisory Board and<br>contributes to the creation of Thoughtworks Technology Radar.
Martin Fowler
I've been educating professional software developers for three decades,<br>and during that time I've seen many “game-changing developments”, most of<br>which fizzle. I'm inclined to think that while there's a stunning amount<br>of hype with AI, some of it will have a genuine impact. My part in this is<br>to help my colleagues communicate what they have learned from day-to-day<br>work with clients all over the world
application architecture
generative AI
Contents
Direct Prompting ✣
Evals ✣
Scoring and Judging
Example
Running the Evals
Evals and Benchmarking
Embeddings ✣
Example Image Embedding
Embeddings in LLM
Retrieval Augmented Generation (RAG) ✣
RAG Template
RAG in Practice
Hybrid Retriever ✣
Query Rewriting ✣
Reranker ✣
Guardrails ✣
Guardrails using LLMs
Embeddings based guardrails
Rule based guardrails
Putting together a Realistic RAG
Fine Tuning ✣
Further Work
Sidebars
LLM benchmarks, evals and tests
The transition of Generative AI powered products from proof-of-concept to<br>production has proven to be a significant challenge for software engineers<br>everywhere. We believe that a lot of these difficulties come from folks thinking<br>that these products are merely extensions to traditional transactional or<br>analytical systems. In our engagements with this technology we've found that<br>they introduce a whole new range of problems, including hallucination,<br>unbounded data access and non-determinism.
We've observed our teams follow some regular patterns to deal with these<br>problems. This article is our effort to capture these. This is early days<br>for these systems, we are learning new things with every phase of the moon,<br>and new tools flood our radar. As with any<br>pattern, none of these are gold standards that should be used in all<br>circumstances. The notes on when to use it are often more important than the<br>description of how it works.
In this article we describe the patterns briefly, interspersed with<br>narrative text to better explain context and interconnections. We've<br>identified the pattern sections with the “✣” dingbat. Any section that<br>describes a pattern has the title surrounded by a single ✣. The pattern<br>description ends with “✣ ✣ ✣”
These patterns are our attempt to understand what we have seen in our<br>engagements. There's a lot of research and tutorial writing on these systems<br>out there, and some decent books are beginning to appear to act as general<br>education on these systems and how to use them. This article is not an<br>attempt to be such a general education, rather it's trying to organize the<br>experience that our colleagues have had using these systems in the field. As<br>such there will be gaps where we haven't tried some things, or we've tried<br>them, but not enough to discern any useful pattern. As we work further we<br>intend to revise and expand this material, as we extend this article we'll<br>send updates to our usual feeds.
Patterns in this Article
Direct PromptingSend prompts directly from the user to a Foundation LLM
EmbeddingsTransform large data blocks into numeric vectors so that<br>embeddings near each other represent related concepts
EvalsEvaluate the responses of an LLM in the context of a specific<br>task
Fine TuningCarry out additional training to a pre-trained LLM to enhance its<br>knowledge base for a particular context
GuardrailsUse separate LLM calls to avoid dangerous input to the LLM or to<br>sanitize its results
Hybrid RetrieverCombine searches using embeddings with other search<br>techniques
Query RewritingUse an LLM to create several alternative formulations of a<br>query and search with all the alternatives
RerankerRank a set of retrieved document fragments according to their<br>usefulness and send the best of them to the LLM.
Retrieval Augmented Generation (RAG)Retrieve relevant document fragments and include these when<br>prompting the LLM
Direct Prompting
Send prompts directly from the user to a Foundation LLM
The most basic approach to using an LLM is to connect an off-the-shelf<br>LLM directly to a user, allowing the user to type...