Recursive AI Research Skill for Claude Code / OpenClaw / Codex

GitHub - Toadoum/ai-research-skill · GitHub

/" data-turbo-transient="true" />

Search or jump to...

Search code, repositories, users, issues, pull requests...

-->

Clear

Search syntax tips

Provide feedback

--> We read every piece of feedback, and take your input very seriously.

Include my email address so I can be contacted

Cancel

Submit feedback

Saved searches

Use saved searches to filter your results more quickly

-->

Name

Query

To see all available qualifiers, see our documentation.

Cancel

Create saved search

/;ref_cta:Sign up;ref_loc:header logged out"}" Sign up

Appearance settings

Resetting focus

You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session.

Dismiss alert

Toadoum

ai-research-skill

Public

Notifications You must be signed in to change notification settings

Fork

Star

main

BranchesTags

Go to file

CodeOpen more actions menu

Folders and files NameNameLast commit message Last commit date Latest commit

History 1 Commit 1 Commit

references

scripts

AGENTS.md

LESSONS.md

LICENSE

README.md

SKILL.md

View all files

Repository files navigation

🔬 AI Research Skill — a self-improving research agent for Claude, Codex & OpenClaw

One SKILL.md that makes any coding agent move fast through the full ML/AI research loop — and get better every time it makes a mistake.

Hypothesis → literature review → reproduce baseline → leak-free experiments → honest analysis → paper. Works with Claude / Claude Code, OpenAI Codex, and OpenClaw. Recursive by design: it logs its own mistakes as rules so it never repeats them.

⭐ If this saves you from one leaked-label result, please star the repo — it helps other researchers find it.

Why this exists

Most AI-agent "research" help is a chatbot that sounds confident and cites papers that don't exist. Real research fails in specific, boring, expensive ways: a baseline you quoted instead of ran, a metric that's secretly leaking the label, a "gain" that's really one lucky seed, a citation invented from memory.

This skill encodes the discipline that catches those failures before they cost you a month — as a portable SKILL.md your agent reads automatically. And it's recursive : when the agent makes a mistake and fixes it, it writes the lesson to LESSONS.md, reads that file at the start of every future task, and stops repeating itself. The skill you use in month three is sharper than the one you installed.

What it does

Stage What the agent does Guardrail it enforces

Frame Turns a vague idea into a testable hypothesis + a stated delta vs prior work No experiment until the claim is one sentence

Review Finds the 5–15 papers that matter, builds a comparison matrix Cite only papers actually read — never from memory

Reproduce Runs the strongest baseline on your setup first You need a ruler before you measure a gain

Design Sets seeds, fixes splits, runs a full leakage audit Suspiciously-good ≠ breakthrough — prove it's not leakage

Run Scaffolds configs so every number is reproducible The config is the single source of truth

Analyze Compares vs baseline with mean ± std over ≥3 seeds A single-seed win is a story, not a finding

Write Backs every claim with a number; ships a reproducibility checklist Never drop the seed/dataset that hurt the story

Quickstart

Clone it, then drop it where your agent looks for skills:

/ai-research-skill.git">git clone https://github.com/your-username>/ai-research-skill.git

Claude / Claude Code / Claude Cowork Install the folder (or a packaged .skill bundle) into your skills directory. Claude keeps the name + description in context always, and loads the full skill when your task looks like AI/ML research. Then just work normally — "help me reproduce this paper's baseline", "why is my F1 suspiciously high?" — and it kicks in.

OpenClaw 🦞 cp -r ai-research-skill ~/.openclaw/workspace/skills/ai-research

OpenClaw reads the same SKILL.md frontmatter + body, and its injected AGENTS.md path lands in the same place. Inspect any skill before installing it — treat community skills like npm packages from strangers.

Codex & other AGENTS.md agents Keep AGENTS.md at your repo root (it's a thin pointer to SKILL.md). Codex reads AGENTS.md and follows the skill from there.

The self-improving loop (the interesting part)

start task ─▶ read LESSONS.md ─▶ do research ─▶ made a mistake? ▲ │ yes │ ▼ └────────── LESSONS.md now has a new rule ◀── log_lesson.py

When the agent catches an error, it runs:

=3 seeds" \ --tags "seeds,reproducibility"">python scripts/log_lesson.py \ --trigger "reported a gain from one training run" \ --mistake "claimed 'beats baseline' from a single seed" \ --fix "re-ran 3 seeds; gain was inside the noise band"...

Recursive AI Research Skill for Claude Code / OpenClaw / Codex

Related Articles

(no title)

Is AI ruining our skills? Early results are in – and they're not good

The Anatomy of an AI-Native Org

Apertus – Open Foundation Model for Sovereign AI

The labor share of income in the US is at its lowest post-war level