Recruit cracked engineers using GitHub

meta871 pts0 comments

Recruit Cracked Engineers Using Public GitHub Data | Powerset Research<br>BackPublished May 27, 2026<br>We've been using public GitHub data internally at Powerset to help our portfolio founders identify and recruit top open source developers.<br>Today, we're making the underlying data layer freely available for your agents. Check out the repo to get started.

Sometimes a video is worth 1,000 words:

Your browser does not support the video tag.<br>Asking Claude Code to find NYC engineers who've worked on terminal coding agents, using the Powerset GitHub dataset over MCP.<br>What's in the Data

The data covers ~400,000 active GitHub repos and is queryable in two ways:

MCP, if you want Claude, Codex, Cursor, or another MCP-compatible client to ask questions conversationally. Also available as a ChatGPT App

DuckDB, if you want to attach directly to our DuckLake instance and run SQL yourself

Repos, contributors, activity, stars, languages, and project metadata are all available in a form your agents can use.<br>No credentials required.

Example questions you can ask:

Find the 5 most impressive systems architects in San Francisco

Who are the best fits for this role? [insert link to engineering job description]

What are the fastest-growing terminal coding agents?

How it Works

We run a daily Modal cron to publish the data as a frozen DuckLake instance backed by Parquet files on Cloudflare R2.<br>You can query it through our hosted MCP endpoint or attach to it directly from DuckDB.

MCP Setup

# Claude Code<br>claude mcp add --transport streamable-http powerset-research https://research-mcp.powerset.dev/mcp/

# OpenAI Codex<br>codex mcp add powerset-research --url https://research-mcp.powerset.dev/mcp/

DuckDB Setup

ATTACH 'ducklake:https://research-data.powerset.dev/github-public/latest/public.ducklake' AS github (READ_ONLY);

SELECT name_with_owner, stars_count, pushed_at<br>FROM github.repos<br>ORDER BY stars_count DESC<br>LIMIT 20;

For agents querying directly through DuckDB, we also provide a skill file with schema context, query patterns, and examples.<br>Full setup instructions and documentation are available in the research-data repo.

Stay in the loop

Get notified when we publish new articles and research

Subscribe

github data powerset research agents using

Related Articles