Recruit Cracked Engineers Using Public GitHub Data | Powerset Research<br>BackPublished May 27, 2026<br>We've been using public GitHub data internally at Powerset to help our portfolio founders identify and recruit top open source developers.<br>Today, we're making the underlying data layer freely available for your agents. Check out the repo to get started.
Sometimes a video is worth 1,000 words:
Your browser does not support the video tag.<br>Asking Claude Code to find NYC engineers who've worked on terminal coding agents, using the Powerset GitHub dataset over MCP.<br>What's in the Data
The data covers ~400,000 active GitHub repos and is queryable in two ways:
MCP, if you want Claude, Codex, Cursor, or another MCP-compatible client to ask questions conversationally. Also available as a ChatGPT App
DuckDB, if you want to attach directly to our DuckLake instance and run SQL yourself
Repos, contributors, activity, stars, languages, and project metadata are all available in a form your agents can use.<br>No credentials required.
Example questions you can ask:
Find the 5 most impressive systems architects in San Francisco
Who are the best fits for this role? [insert link to engineering job description]
What are the fastest-growing terminal coding agents?
How it Works
We run a daily Modal cron to publish the data as a frozen DuckLake instance backed by Parquet files on Cloudflare R2.<br>You can query it through our hosted MCP endpoint or attach to it directly from DuckDB.
MCP Setup
# Claude Code<br>claude mcp add --transport streamable-http powerset-research https://research-mcp.powerset.dev/mcp/
# OpenAI Codex<br>codex mcp add powerset-research --url https://research-mcp.powerset.dev/mcp/
DuckDB Setup
ATTACH 'ducklake:https://research-data.powerset.dev/github-public/latest/public.ducklake' AS github (READ_ONLY);
SELECT name_with_owner, stars_count, pushed_at<br>FROM github.repos<br>ORDER BY stars_count DESC<br>LIMIT 20;
For agents querying directly through DuckDB, we also provide a skill file with schema context, query patterns, and examples.<br>Full setup instructions and documentation are available in the research-data repo.
Stay in the loop
Get notified when we publish new articles and research
Subscribe