Ground Control: A zero-back end admin tool to process 50k records in the browser

praveenscience1 pts0 comments

Ground Control: The Admin Dashboard Nobody Sees | Praveen Kumar Purushothaman | My Adventures

SSoC 2026 has 50,000 registered participants, 300+ projects, and 5,000+ mentors. The public-facing site is what everyone sees, the leaderboard, the project cards, the contributor profiles. But the tool that actually keeps the programme running lives at a route that is not in the navbar, not in the footer, and not linked from anywhere. You get to it by navigating to a hidden /local-prav-tools and clicking “Ground Control”.

This is the story of building internal tooling for a programme about open source, and why the most impactful code we wrote this season is the code nobody will ever see.

The Problem: Death by Discord DM

Every open-source programme at scale has the same failure mode. A contributor opens a Discord thread:

“Hey, I submitted a PR two days ago but my score isn’t showing on the leaderboard.”

Sounds simple. It is not. To debug this, you need to check three separate data sources:

Registration data , Did they register? Is their GitHub username spelled correctly? Did they paste the full URL instead of just the username?

Scoring engine output , Did the scoring engine pick up their PR? Is there a case mismatch between UserName and username?

Project registry , Is the repo they contributed to actually a registered SSoC project?

Each of these lives in a different place. Registration data is a Google Form that exports to a TSV. Scoring engine output is a set of JSON files generated by a Node script that runs on someone’s laptop. The project registry is a JSON config file in the site repo.

Before Ground Control, resolving a single “where’s my score?” question took 15-30 minutes of manual cross-referencing. With 50,000 participants, even a 1% confusion rate means 500 support threads. The math does not work.

Ground Control collapses that 15-30 minutes to about 10 seconds.

Architecture: Deliberately Unsophisticated

Ground Control has no backend. No database. No API. The entire system runs on two data sources:

A TSV file exported from Google Forms, dropped into /LocalData/MasterSheetsData.tsv

A set of JSON globals (window.prs, window.userScores, window.paMetrics) injected via script tags from the scoring engine

That is it. The React component fetches the TSV over a local HTTP request, parses it in the browser, and cross-references it with whatever scoring data happens to be loaded in the page.

This is a deliberate architectural choice, not a shortcut. The registration data contains phone numbers, email addresses, and LinkedIn profiles. Running a backend means that data traverses a network. With Ground Control running purely client-side against a local file, registration data never leaves the admin’s machine. Privacy by architecture, not by policy.

The TSV Parser Nobody Wanted to Write

Google Forms exports TSV. TSV is simple until it is not. The “Project Description” field is a free-text area, which means participants paste multi-line content, which means the TSV has quoted fields that span multiple lines.

The parser handles this by walking through lines and tracking quote parity:

Copy<br>typescript

const logicalRows: string[] = [];<br>let current = "";<br>let inQuote = false;

for (const line of lines) {<br>if (!inQuote) {<br>current = line;<br>} else {<br>current += "\n" + line;<br>const quoteCount = (line.match(/"/g) || []).length;<br>if (quoteCount % 2 === 1) {<br>inQuote = !inQuote;<br>if (!inQuote) {<br>logicalRows.push(current);<br>current = "";

Is this a production-grade TSV parser? No. Does it handle every edge case in RFC 4180? Definitely not. But it handles the specific shape of data that Google Forms actually produces, which is the only shape we care about. Internal tools earn the right to be narrow.

The Registrations Tab: Validate Everything

The first thing Ground Control does when it loads the TSV is run every single row through a validation pipeline. Not because we are pedantic about data quality, but because dirty registration data causes real downstream failures.

Here is what it checks:

Name casing : Is “john doe” actually “John Doe”? The toTitleCase function is naive but effective.

GitHub URL cleanup : People paste https://github.com/username/, github.com/username, @username, or just username. The extractGitHubUsername function normalises all of these down to the bare username. If the stored value differs from the clean version, that is an issue.

LinkedIn URL cleanup : Strip query parameters and fragments. LinkedIn URLs with ?utm_source=... trailing params break nothing but look sloppy and signal data quality problems.

Phone format : Must be international format (+91...). India-centric, yes, but that is where 90%+ of our participants are.

Email validation : Basic regex. Catches the obvious typos.

Deduplication Logic

People submit the registration form multiple times. A naive dedup by email would break Project Admins, who legitimately submit once per project. So the dedup key is...

data username ground control project registration

Related Articles