GitHub - Toadoum/ai-research-skill ยท GitHub
/" data-turbo-transient="true" />
Skip to content
Search or jump to...
Search code, repositories, users, issues, pull requests...
-->
Search
Clear
Search syntax tips
Provide feedback
--><br>We read every piece of feedback, and take your input very seriously.
Include my email address so I can be contacted
Cancel
Submit feedback
Saved searches
Use saved searches to filter your results more quickly
-->
Name
Query
To see all available qualifiers, see our documentation.
Cancel
Create saved search
Sign in
/;ref_cta:Sign up;ref_loc:header logged out"}"<br>Sign up
Appearance settings
Resetting focus
You signed in with another tab or window. Reload to refresh your session.<br>You signed out in another tab or window. Reload to refresh your session.<br>You switched accounts on another tab or window. Reload to refresh your session.
Dismiss alert
{{ message }}
Toadoum
ai-research-skill
Public
Notifications<br>You must be signed in to change notification settings
Fork
Star
main
BranchesTags
Go to file
CodeOpen more actions menu
Folders and files<br>NameNameLast commit message<br>Last commit date<br>Latest commit
History<br>1 Commit<br>1 Commit
references
references
scripts
scripts
AGENTS.md
AGENTS.md
LESSONS.md
LESSONS.md
LICENSE
LICENSE
README.md
README.md
SKILL.md
SKILL.md
View all files
Repository files navigation
๐ฌ AI Research Skill โ a self-improving research agent for Claude, Codex & OpenClaw
One SKILL.md that makes any coding agent move fast through the full ML/AI research loop โ and get better every time it makes a mistake.
Hypothesis โ literature review โ reproduce baseline โ leak-free experiments โ honest analysis โ paper. Works with Claude / Claude Code, OpenAI Codex, and OpenClaw. Recursive by design: it logs its own mistakes as rules so it never repeats them.
โญ If this saves you from one leaked-label result, please star the repo โ it helps other researchers find it.
Why this exists
Most AI-agent "research" help is a chatbot that sounds confident and cites papers that don't exist. Real research fails in specific, boring, expensive ways: a baseline you quoted instead of ran, a metric that's secretly leaking the label, a "gain" that's really one lucky seed, a citation invented from memory.
This skill encodes the discipline that catches those failures before they cost you a month โ as a portable SKILL.md your agent reads automatically. And it's recursive : when the agent makes a mistake and fixes it, it writes the lesson to LESSONS.md, reads that file at the start of every future task, and stops repeating itself. The skill you use in month three is sharper than the one you installed.
What it does
Stage<br>What the agent does<br>Guardrail it enforces
Frame<br>Turns a vague idea into a testable hypothesis + a stated delta vs prior work<br>No experiment until the claim is one sentence
Review<br>Finds the 5โ15 papers that matter, builds a comparison matrix<br>Cite only papers actually read โ never from memory
Reproduce<br>Runs the strongest baseline on your setup first<br>You need a ruler before you measure a gain
Design<br>Sets seeds, fixes splits, runs a full leakage audit<br>Suspiciously-good โ breakthrough โ prove it's not leakage
Run<br>Scaffolds configs so every number is reproducible<br>The config is the single source of truth
Analyze<br>Compares vs baseline with mean ยฑ std over โฅ3 seeds<br>A single-seed win is a story, not a finding
Write<br>Backs every claim with a number; ships a reproducibility checklist<br>Never drop the seed/dataset that hurt the story
Quickstart
Clone it, then drop it where your agent looks for skills:
/ai-research-skill.git">git clone https://github.com/your-username>/ai-research-skill.git
Claude / Claude Code / Claude Cowork<br>Install the folder (or a packaged .skill bundle) into your skills directory. Claude keeps the name + description in context always, and loads the full skill when your task looks like AI/ML research. Then just work normally โ "help me reproduce this paper's baseline", "why is my F1 suspiciously high?" โ and it kicks in.
OpenClaw ๐ฆ<br>cp -r ai-research-skill ~/.openclaw/workspace/skills/ai-research
OpenClaw reads the same SKILL.md frontmatter + body, and its injected AGENTS.md path lands in the same place. Inspect any skill before installing it โ treat community skills like npm packages from strangers.
Codex & other AGENTS.md agents<br>Keep AGENTS.md at your repo root (it's a thin pointer to SKILL.md). Codex reads AGENTS.md and follows the skill from there.
The self-improving loop (the interesting part)
start task โโถ read LESSONS.md โโถ do research โโถ made a mistake?<br>โฒ โ yes<br>โ โผ<br>โโโโโโโโโโโ LESSONS.md now has a new rule โโโ log_lesson.py
When the agent catches an error, it runs:
=3 seeds" \<br>--tags "seeds,reproducibility"">python scripts/log_lesson.py \<br>--trigger "reported a gain from one training run" \<br>--mistake "claimed 'beats baseline' from a single seed" \<br>--fix "re-ran 3 seeds; gain was inside the noise band"...