Creating another MCP server, but this one is for research - While in the lab
:first-child{mt:0!}<br>_:where(pre){p:20;_r:8;_overflow:auto}<br>_:where(pre,code:not(.highlight_*)){bg:fade-2;_bg:fade-92!@dark}<br>_:where(strong,b,a,code:not(.highlight_*),mark,del){font:fade-92;_font:fade-12@dark}<br>_:where(table){width:full;_border-spacing:0}<br>_:where(td){v:baseline}<br>_:where(td,th):first-child{pl:0}<br>_:where(td,th):last-child{pr:0}<br>_:where(td,th){bb:1;solid;fade-92/.06;_p:6;_b:fade-4/.04@dark}<br>_:where(th){font:fade-78;_font:14;_text:left;_font:fade-12@dark}<br>_:where(th,p_code,li_code,a,mark){font:semibold;_font:medium@dark}<br>_:where(ul){list-style-type:disc}<br>_:where(ul,ol,blockquote){pl:1.5em}<br>_:where(video,img){max-width:full}<br>_:where(a,mark){text-underline-offset:3}<br>_:where(hr){h:2;_bg:fade-10;_bg:fade-70@dark;_my:3em}<br>">In the weeks following my last blog post, I had a niggling feeling that I could apply an MCP server to my literature review. This post is my first run at exploring that.
During my grad degree, it has been one of the hardest aspects I’ve had to learn: How do you take papers and distill them down to a supporting argument, a newfound gap, or evidence that something you want to build has a high chance of working? It’s initially really time consumptive, and to my surprise, isn’t a uniform process for labs and researchers. For me I have a massive Google sheet with a tab for each area I’m investigating, a column for each aspect I want to explore in that area, and a row for each paper I’ve found. With some starter papers in hand I’ll review the papers it cites (backwards pass) and the papers that cite it (forward pass) and add new papers to the spreadsheet based on relevance to the area. Another student I know tries to follow something similar to a PRISMA approach by reviewing a conference at a time, screening papers, and including works in their manuscript based on eligibility. Another student frantically searches keywords on Google Scholar right before the deadline (this is ill-advised).
However, not everyone takes this approach. Last year I reviewed a paper where every single citation was fake. All of them. It was not only a waste of my time, but if I hadn’t caught it, it could have been published and given false credibility to an idea that hadn’t been proven. This is one of the reasons why the pre-print site ArXiv announced they’re banning authors if they let LLMs generate the paper and ACL will desk reject papers with fake citations.
I think LLMs can have their place in the research process but the review of related work is what makes a work trustworthy and part of a solid foundation for others to use. There are some LLM tools that try to improve on the paper-finding process (Google Scholar Labs, Undermind.ai) but reproducibility of that search process still can be an issue and it can be unclear to see how a selected work fits into a broader scope of an area.
The jury is still out on how these models will be used in the future, but I was interested in how I could use one to make a tool to aid in my reviews.
Initial idea
I wanted to build a tool that helped me to make arguments but also was grounded in real research concepts and could be audited for correctness.
I started with my spreadsheet of papers I’ve already reviewed and decided building an MCP server that could review through those made sense. This way I also don’t have any real storage costs, since I could use the Google Docs API to “host” the detailed attributes of papers that I had already vetted. There’s 16 active-ish sheets with about 30 papers each, but I started with just four sheets (i.e. research areas).
A screenshot of my literature review spreadsheet on Google Sheets
I used Claude to write a short Python script to create a basic MCP server that I could host on AWS Lambda (free tier!). The server had a function (technically called a tool) for each sheet and an additional prompt for a description of my prototype, my study goal, and my planned study procedure.
Once I wired the server into cursor, I gave it a go:
Generating summaries of methods based on prior work I’ve reviewed
Genrating arguments based on methods in prior work
I like this because I know how it arrived at these claims and I know which papers I need to review if I want more details on them. Plus, if I make changes in my spreadsheet, it’s instantly reflected in my server. That being said, it isn’t perfect; The output is unstructured and some papers were repeated since I didn’t have a broad enough scope of that sub-area, but I think I could solve that with an updated review.
My next step was a new sheet/tab that holds all the gids (indicies in Google sheets to reference each sheet in the file) associated with an area. This made it so I could auto-generate a function that I could call for getting the papers from each area. I also added search functions so in...