Building a Personal RAG Chatbot in a Few Days: Learning by Engineering<br>Building a Personal RAG Chatbot in a Few Days: Learning by Engineering<br>Recently, I built a small personal Retrieval-Augmented Generation (RAG) chatbot.<br>It was not a long research project.<br>It was not something I spent weeks architecting.<br>And it definitely was not built from years of prior AI engineering experience.<br>It was a compact engineering exercise built over a few days, mostly through reading, experimenting, and applying engineering fundamentals to a domain I had not worked in before.<br>That is exactly why I wanted to write about it.<br>This project reminded me of something I strongly believe:<br>Engineering is rarely about already knowing the exact technology.<br>It is about being able to decompose unfamiliar systems fast enough to build useful solutions.
The goal was simple.<br>I wanted a chatbot that could answer questions using my own technical writing and project documentation instead of relying purely on generic model knowledge.<br>The Problem<br>Large language models are powerful.<br>But they have an obvious limitation when you want domain-specific answers.<br>If someone asks:<br>“Tell me about your backend engineering experience”<br>a general model can generate something plausible.<br>But plausibility is not accuracy.<br>I wanted responses grounded in:<br>My project documentation<br>Technical notes<br>Markdown-based writing<br>Structured knowledge I control<br>This meant I needed retrieval.<br>Instead of expecting the model to already know my data, I wanted it to fetch relevant context dynamically.<br>That naturally led to Retrieval-Augmented Generation.<br>Why RAG?<br>The obvious alternative was fine-tuning.<br>At first glance, that sounds attractive.<br>Train the model directly on your data and let it internalize your knowledge.<br>But for this use case, it would have introduced unnecessary complexity:<br>Longer experimentation cycles<br>Retraining after updates<br>Higher compute cost<br>Harder debugging<br>RAG offered something much simpler.<br>It separates knowledge storage from generation.<br>That means updating the system is as simple as updating documents and re-indexing.<br>No retraining required.<br>For a lightweight personal knowledge system, that architectural simplicity mattered.<br>System Architecture<br>The system follows a simple pipeline:<br>Markdown Documents<br>Document Parser<br>Chunking<br>Embeddings<br>PostgreSQL Storage<br>Semantic Retrieval<br>Prompt Construction<br>LLM Response
Each layer has one responsibility.<br>This separation made iteration much easier.<br>If responses were weak, I could inspect retrieval.<br>If retrieval was weak, I could inspect chunking.<br>Keeping concerns isolated made debugging straightforward.<br>Why FastAPI<br>Coming from Flask and Django experience, FastAPI felt like the right tool.<br>It provided:<br>Strong request validation<br>Async support<br>Clean structure<br>Explicit typing<br>A typical request model looked like:<br>class ChatRequest(BaseModel):<br>message: str<br>user_hash: str
FastAPI also made the project easy to organize:<br>app/<br>api/<br>services/<br>models/<br>retrieval/<br>config/
That modularity became very useful as the system evolved.<br>Why PostgreSQL<br>A common question is:<br>Why not use a dedicated vector database?<br>Tools like Pinecone and Weaviate are excellent.<br>But this project had different priorities.<br>I wanted:<br>Low operational complexity<br>Minimal infrastructure overhead<br>Familiar tooling<br>Easy deployment<br>PostgreSQL offered the right balance.<br>This project was about understanding retrieval mechanics, not building hyperscale search infrastructure.<br>It reinforced an important engineering lesson:<br>The best tool is often the simplest one that solves the actual problem.
The Real Challenge: Chunking<br>One of the most interesting lessons was how important chunking is.<br>At first, I tried fixed-length chunking.<br>It worked, but retrieval quality was inconsistent.<br>Why?<br>Because semantic meaning often spans logical sections.<br>Breaking content purely by character count often destroys context.<br>A better approach was preserving:<br>Section boundaries<br>Paragraph grouping<br>Topic continuity<br>This dramatically improved retrieval quality.<br>It quickly became clear that many “model quality” problems are actually retrieval preparation problems.<br>Retrieval Flow<br>When a user sends a query, the system follows this process:<br>1. Validate request<br>The API receives and normalizes the query.<br>2. Generate query embeddings<br>The query is converted into vector representation.<br>3. Search semantically similar chunks<br>PostgreSQL retrieves the closest matches.<br>4. Construct prompt context<br>Relevant chunks are assembled into the context window.<br>5. Generate response<br>The model responds using retrieved context.<br>Conceptually simple.<br>Practically, the challenge is tuning each stage well enough that the final context remains useful.<br>Prompt Engineering<br>Retrieval alone is not enough.<br>The model still needs behavioral constraints.<br>My early prompts were too permissive.<br>That caused:<br>Unsupported assumptions<br>Overconfident answers<br>Context stretching<br>The solution was stronger grounding rules:<br>Answer only using...