sinsesgo: An autonomous daily briefing on Spanish media bias | Alberto Pou sinsesgo: An autonomous daily briefing on Spanish media bias<br>April 2026 - Alberto Pou<br>For years I’ve read Spanish news with a thumb on the scroll bar and one eye on the source. The same event can read like a triumph in one outlet and a scandal in another, and that gap keeps widening. What I wanted was simple: a tool that reads the same story across the spectrum, surfaces what everyone agrees on, and highlights what each side leaves out.
Ground News does this for the English-speaking world. Their daily briefing was the inspiration. But Ground News does not cover Spanish outlets in any meaningful depth, and most of their bias signal comes from the source label, not from the content of each article. I wanted something focused on Spain, that read every article in full, and that ran on its own every afternoon without me touching it.
That project is sinsesgo . Every afternoon, a cron job pulls articles from 18 Spanish outlets, clusters them by topic, picks the five most covered stories of the day, analyze them through a pipeline of agents, and builds a briefing that contrasts how the left, center and right framed each one. No login, no paywall, no human in the loop.
This post is for the technical readers who land on the funcionamiento page and want the long version. I will walk through the pipeline end to end: data ingestion, embeddings, clustering, and the agents that do the analysis. I will also explain a second use case that lives in the codebase but does not run in production, and why.
Why I built it
Two reasons converged on the same project.
The first is professional. I work with LLMs every day and I wanted to spend time inside the agent ecosystem: orchestration graphs, retries, structured outputs, multi-model routing, RAG over a real corpus. Reading posts and tutorials only takes you so far. Building a pipeline that has to run unattended every day, parse messy real-world inputs, and produce something a stranger can read forces you to learn the parts that toy projects skip.
The second is personal. I want to read the news without picking a side first. Every Spanish outlet I open has a clear lean, and switching between two of them does not give you the truth, it gives you two opinions and a headache. What I actually want is a single place that reads the same story across the spectrum, tells me which facts every outlet agrees on, and lists the ones each side leaves out. That tool did not exist for Spanish media, so I built it.
sinsesgo goes deeper than Ground News. It does not just signal bias at the outlet level; it reads every article. Every article goes through a chain of agents that examine the headline framing, the logical fallacies, the sources quoted, and the specific facts the outlet chose to omit, applied to Spain’s media landscape with the granularity that geographic focus allows.
The pipeline
The system runs four stages in sequence. Each stage feeds the next.
╭──────────────────────────╮<br>│ ① Data ingestion │<br>╰────────────┬─────────────╯<br>╭──────────────────────────╮<br>│ ② Embeddings │<br>╰────────────┬─────────────╯<br>╭──────────────────────────╮<br>│ ③ Topic clustering │<br>╰────────────┬─────────────╯<br>╭──────────────────────────╮<br>│ ④ Agent pipeline │<br>╰──────────────────────────╯<br>The whole pipeline is a Django project with PostgreSQL and pgvector, orchestrated with LangGraph. Two cron jobs on Render trigger the stages: ingestion runs several times a day, and the briefing runs once every afternoon.
Stage 1: Data ingestion
First problem: RSS feeds are chaos. Some outlets publish full bodies. Others truncate to 200 characters and force a scrape. A few publish broken XML. The ingestion command handles this by walking every outlet, fetching the feed, detecting and recovering from malformed XML, and filtering out sports, gossip and lifestyle noise.
What survives is a raw record with URL, headline, snippet, author and date. The next step scrapes the full article body and writes to the database. Each outlet carries a bias score I curated by hand, from -1 (far left) to +1 (far right).
Stage 2: Embeddings
I store a 1536-dimensional vector per article in a pgvector column. The model is OpenAI text-embedding-3-small. The text I embed is whatever is longest available: full body, falling back to the snippet, falling back to the headline.
text-embedding-3-small is cheap (about $0.02 per million tokens) and good enough for clustering articles by topic.
The embedding column drives two later behaviors:
Topic clustering for the daily briefing.
Opposite article retrieval for one of the agents, which finds politically opposite outlets covering the same event with pgvector cosine distance plus a temporal proximity boost.
Stage 3: Topic clustering
This is where the briefing starts to take shape. Given all embeddings of articles published on a given day, I cluster them to find the main topics. I use Agglomerative Clustering from scikit-learn...