DAAF: Rigorous+responsible data analysis/research with Claude Code (open-source)

brhkim1 pts0 comments

DAAF - Data Analyst Augmentation Framework

Your expertise, empowered by Claude

Scale Responsible, Rigorous, and Reproducible Research with AI

AI coding assistants can now write analytic code, manipulate complex datasets, and produce entire reports. They also hallucinate, cut corners, and constantly project confidence.

The Data Analyst Augmentation Framework is a free, open-source toolkit that turns Claude Code into a rigorous quantitative research engine with you at the helm: every step auditable, every output verifiable, every decision yours to make. Built by researchers, for researchers

Get Started with DAAF<br>See What Makes It Different

Ad Hoc<br>Data Lookup<br>Full Pipeline

Claude Code v2.1.112<br>Opus 4.6 (1M context)

User: Hey DAAF, I'm thinking of running a regression discontinuity analysis with my own dataset. Before I do anything else, can you help me think through conceptually what assumptions/conditions are required for a good RD?

You interact with DAAF and Claude Code just like any chat window: describe what you want to do in plain, natural language. No special syntax or commands to have to juggle and remember.

Next &rarr;

DAAF: Thinking…

DAAF: Welcome to DAAF -- the Data Analyst Augmentation Framework. I'm your research orchestrator, here to help you work through rigorous, reproducible data analysis with human oversight at every key decision point.

Great question -- thinking through RD assumptions before touching data is exactly the right instinct. This is a conceptual/advisory conversation, so I'd classify this as Ad Hoc Collaboration mode. I'll work with you as a thought partner to think through the assumptions and conditions for a valid regression discontinuity design. No formal deliverables -- just a focused discussion. You can steer the conversation wherever it's most useful.

Sound good, or would you rather approach this differently?

DAAF intelligently determines which bespoke workflow best fits your request. Like a good collaborator, it communicates its plan and asks for your confirmation before proceeding. Ad Hoc Collaboration mode is a lightweight workflow for informal discussion and one-off tasks -- one of nine workflow styles DAAF supports.

Next &rarr;

User: That makes sense and sounds good, let's proceed

DAAF: Let me load the mode reference, then we'll dive in.

DAAF: Read(skills/daaf-orchestrator/.../ad-hoc-collaboration-mode.md)❲ Read 421 lines

DAAF: Now let me load the data-scientist skill as the mode requires, then I'll walk through the RD assumptions with you.

DAAF: Skill(data-scientist)❲ Successfully loaded skill

DAAF: Now let me pull in the causal inference reference for the detailed RD methodology.

DAAF: Read(skills/data-scientist/.../causal-inference.md)❲ Read 898 lines

DAAF: Thinking…

DAAF dynamically selects relevant references from a large library of carefully vetted guides ("Agent Skills") -- then embeds them into the conversation to ground its processes and thinking from that point on in known-good material that helps fight hallucinations. Hover over any file to learn more and see what those reference files actually contain.

Next &rarr;

DAAF: Great question to start with -- getting the conceptual foundations right before touching data is exactly how credible causal work begins. Here's a thorough walkthrough of what makes an RD design valid.

Regression Discontinuity: The Core Assumptions

RD exploits a known cutoff in a running variable (also called a &ldquo;score&rdquo; or &ldquo;forcing variable&rdquo;) that determines treatment assignment. The fundamental idea is that units just above and just below the cutoff are essentially identical except for their treatment status -- creating quasi-random variation near the threshold.

1. A Known, Rule-Based Cutoff

There must be a clear, predetermined threshold where treatment status changes. Examples: a test score cutoff for scholarship eligibility, an income threshold for program access, an age cutoff for policy eligibility.

The cutoff must be rule-based, not discretionary. If a committee decides who gets treated using the score as one input among many, you don't have an RD -- you have selection on observables (at best).

2. Continuity of Potential Outcomes at the Cutoff

This is the core identifying assumption: absent the treatment, outcomes would have evolved smoothly through the cutoff. In potential-outcomes notation, E[Y(0)|X=c] and E[Y(1)|X=c] must be continuous at cutoff c.

3. No Manipulation of the Running Variable (No Sorting)

Units cannot precisely control their score to land on a preferred side of the cutoff. The McCrary (2008) density test checks whether the distribution of the running variable is smooth at the cutoff.

4. Sharp vs. Fuzzy -- Know Which You Have

Even in short interactions, DAAF grounds itself in real citations and references to avoid relying on its fuzzy general knowledge or inconsistent results from web searches. Have more follow-ups? Want to write some actual code? DAAF is...

daaf data cutoff code through claude

Related Articles