LLM layer for a Rails application — dmitrytsepelev.dev<br>LLM layer for a Rails application | dmitrytsepelev.dev
← Back to writing
Like it or not, a lot of applications are adding AI–native features: anything related to automated answers, object classification, knowledge base search, or text summarization can already be handed off to an LLM with pretty good results. If you happen to do this as a Rails engineer, this post will definitely be useful.
In this post I will describe my approach to LLM integration for Rails applications. We will discuss some common problems, explore related gems, build our own architecture layer for LLM integration, cover it with specs, and discuss ways to prepare the context.
Why we need a layer
Integrating an LLM into a Rails app at the early stages usually does not differ much from connecting any other API: we make a call with some parameters and get the response back, which is then used in the business layer. Of course, we should not forget to handle errors, move the interaction itself to the background, and so on. Nothing unusual so far.
Soon it turns out that things are not that simple: even though it’s nominally the same call, the parameters differ a lot from case to case, and preparing them requires separate work. One of the most important parameters is the prompt: we need to explain to the LLM what we actually want from it. For simple, short prompts, string interpolation is enough, but you’ll quickly outgrow it.
Error handling also becomes complicated and verbose: on top of network and server errors, an incorrect response (from the business logic standpoint) can come back, so we need to add validations.
At this point, an experienced engineer starts looking for libraries that would take at least some of these routine tasks off their hands. After some time working with the raw OpenAI adapter, I ended up with the following list of goals:
make it easier to support and replace models/providers;
separate the LLM interaction code from the business logic;
get rid of all the boilerplate (storing schemas/instructions, preparing templates);
have centralized logging in place, since you often want to inspect the “raw” response from the model when behavior is unexpected.
Library choice
Finding something isn’t hard: ruby_llm, activeagent, and a number of smaller solutions offer different levels of abstraction. In this post I will tell you which option I ended up with: ruby_llm as a transport plus my own layer on top.
While I was working on the post, I found ruby_llm-agents, which is pretty similar to what I came up with. How could I miss it? It was released in January, and I was working on this in October.
Moreover, I discovered that ruby_llm has shipped a similar DSL too. Fortunately my approach is a bit different, otherwise you would not be reading this post!
A quick tour of ruby_llm
ruby_llm is a library for working with different LLM providers (OpenAI, Anthropic, Google, and others). The interface for each model is similar, and some common tasks (e.g., error handling) are implemented right inside the library.
The main abstraction is a chat, which represents a single conversation with the LLM (and can include more than one message from us). The chat is configured through a chain of calls. For instance, with_instructions sets the system prompt, and with_schema enables structured output: the model is required to return JSON strictly following the specified JSON Schema.
class TicketSchema RubyLLM::Schema<br>string :category<br>string :summary<br>end
chat = RubyLLM.chat(model: "gpt-4o")<br>chat.with_schema(TicketSchema)<br>chat.with_instructions("Classify the ticket.")
response = chat.ask(ticket.text)<br>response.content # => {"category" => "billing", "summary" => "..."}
Note that response.content is already parsed according to the schema!
Yes, this also means JSON Schema validation comes for free—one less thing to write yourself.
The next useful feature is persistence. We can save all our chats to the database. To do that, we generate the tables and models using ruby_llm’s generator and slightly adjust our code: instead of RubyLLM.chat we use Chat.create!, and everything just works.
Chat.create!(model: "gpt-4o")<br>.with_instructions(instructions)<br>.with_schema(output_schema)<br>.ask(prompt)
If you decide to use persistence, think about two things:
Chat and Message often carry some kind of business context, so you might want to rename the models and/or move them to a namespace—better do it right away;
these tables are going to be big. Really. Think about partitioning or storing them somewhere outside the main DB.
Don’t say I didn’t warn you when the messages table hits 10M rows.
Designing the base class
Each LLM call should be wrapped in a separate class that inherits from a base class (let’s call it BaseLLMRequest). All the boilerplate and instrumentation lives in the base class, while subclasses only configure the request parameters. The base class can be implemented like...