Building a Personal RAG Chatbot in a Few Days

Building a Personal RAG Chatbot in a Few Days: Learning by Engineering Building a Personal RAG Chatbot in a Few Days: Learning by Engineering Recently, I built a small personal Retrieval-Augmented Generation (RAG) chatbot. It was not a long research project. It was not something I spent weeks architecting. And it definitely was not built from years of prior AI engineering experience. It was a compact engineering exercise built over a few days, mostly through reading, experimenting, and applying engineering fundamentals to a domain I had not worked in before. That is exactly why I wanted to write about it. This project reminded me of something I strongly believe: Engineering is rarely about already knowing the exact technology. It is about being able to decompose unfamiliar systems fast enough to build useful solutions.

The goal was simple. I wanted a chatbot that could answer questions using my own technical writing and project documentation instead of relying purely on generic model knowledge. The Problem Large language models are powerful. But they have an obvious limitation when you want domain-specific answers. If someone asks: “Tell me about your backend engineering experience” a general model can generate something plausible. But plausibility is not accuracy. I wanted responses grounded in: My project documentation Technical notes Markdown-based writing Structured knowledge I control This meant I needed retrieval. Instead of expecting the model to already know my data, I wanted it to fetch relevant context dynamically. That naturally led to Retrieval-Augmented Generation. Why RAG? The obvious alternative was fine-tuning. At first glance, that sounds attractive. Train the model directly on your data and let it internalize your knowledge. But for this use case, it would have introduced unnecessary complexity: Longer experimentation cycles Retraining after updates Higher compute cost Harder debugging RAG offered something much simpler. It separates knowledge storage from generation. That means updating the system is as simple as updating documents and re-indexing. No retraining required. For a lightweight personal knowledge system, that architectural simplicity mattered. System Architecture The system follows a simple pipeline: Markdown Documents Document Parser Chunking Embeddings PostgreSQL Storage Semantic Retrieval Prompt Construction LLM Response

Each layer has one responsibility. This separation made iteration much easier. If responses were weak, I could inspect retrieval. If retrieval was weak, I could inspect chunking. Keeping concerns isolated made debugging straightforward. Why FastAPI Coming from Flask and Django experience, FastAPI felt like the right tool. It provided: Strong request validation Async support Clean structure Explicit typing A typical request model looked like: class ChatRequest(BaseModel): message: str user_hash: str

FastAPI also made the project easy to organize: app/ api/ services/ models/ retrieval/ config/

That modularity became very useful as the system evolved. Why PostgreSQL A common question is: Why not use a dedicated vector database? Tools like Pinecone and Weaviate are excellent. But this project had different priorities. I wanted: Low operational complexity Minimal infrastructure overhead Familiar tooling Easy deployment PostgreSQL offered the right balance. This project was about understanding retrieval mechanics, not building hyperscale search infrastructure. It reinforced an important engineering lesson: The best tool is often the simplest one that solves the actual problem.

The Real Challenge: Chunking One of the most interesting lessons was how important chunking is. At first, I tried fixed-length chunking. It worked, but retrieval quality was inconsistent. Why? Because semantic meaning often spans logical sections. Breaking content purely by character count often destroys context. A better approach was preserving: Section boundaries Paragraph grouping Topic continuity This dramatically improved retrieval quality. It quickly became clear that many “model quality” problems are actually retrieval preparation problems. Retrieval Flow When a user sends a query, the system follows this process: 1. Validate request The API receives and normalizes the query. 2. Generate query embeddings The query is converted into vector representation. 3. Search semantically similar chunks PostgreSQL retrieves the closest matches. 4. Construct prompt context Relevant chunks are assembled into the context window. 5. Generate response The model responds using retrieved context. Conceptually simple. Practically, the challenge is tuning each stage well enough that the final context remains useful. Prompt Engineering Retrieval alone is not enough. The model still needs behavioral constraints. My early prompts were too permissive. That caused: Unsupported assumptions Overconfident answers Context stretching The solution was stronger grounding rules: Answer only using...

Building a Personal RAG Chatbot in a Few Days

Related Articles

The Newest Instagram "Exploit" Is the Goofiest I've Seen

Apple WWDC 2026 Livestream

Claude Fable 5

It's Not Just X. It's Y

Show HN: GoPeek – open links in live mini browser windows without new tabs