Exploring Agent-Assisted Qualitative Analysis<br>Now that I’ve finished a long year (years, really) searching for a faculty job and accepted an offer, I can finally get back to my usual blogging antics! After coming back from all my interviews, it seemed that AI agents suddenly got a lot better at everything, so I wondered: what are some challenging workflows I did during my PhD, and could AI agents help me automate parts of them?
One workflow that felt particularly interesting to revisit was qualitative analysis.
Qualitative analysis is basically the process of reading a lot of messy unstructured data and trying to figure out what is interesting, recurring, surprising, or important.
Concretely, the question is: What is the “right” way to do agent-assisted qualitative analysis? This is clearly a big research question, and I can’t answer it in one blog post. Instead, I run some cute experiments with naive agentic setups for qualitative analysis, varying how much the human is in the loop, and report what I learned.
First, I’ll give some background on qualitative analysis. Then I’ll describe the experimental setup, go through the findings, and briefly talk about what I think is exciting to work on next.
Throughout, remember that this is a blog post, not a paper :)
Background
Before getting into the experiments, I’ll give some background on qualitative analysis, the specific methodology I used (grounded theory), and why I think this is such an interesting problem for AI systems.
Despite the specialized name, qualitative analysis is a familiar research practice across many fields—not something confined to ethnography or the social sciences.<br>E.g., mining agent logs for failure modes, analyzing narratives in interview transcripts, synthesizing patterns from user-research sessions, and close-reading news coverage for recurring framings all involve qualitative analysis in some form.
Grounded theory
There are many ways to do qualitative analysis. The one I learned during my PhD is grounded theory: a method for answering a research question by building the answer up from the data itself, rather than starting from a fixed hypothesis.
I will illustrate grounded theory with an example. Suppose the research question is why do PhD students seriously consider leaving their program? and the data is interview transcripts with current and former PhD students. Grounded theory usually proceeds in stages:
Open coding. We read through the data and attach short labels (codes) to passages of interest. For example, a passage about a stalled project might get coded as “no progress for months”; one about an unresponsive advisor, “absent advisor”; one about feeling behind peers, “social comparison.” As we move through the corpus, we compare new passages against existing codes, merging similar ones and splitting overly broad ones.
Axial coding. We group related codes into higher-level categories. For example, “absent advisor,” “shifting goals,” and “no progress for months” might cluster into mentorship breakdown, while “social comparison” and “imposter feelings” might cluster into identity strain. We may also look for relationships between categories.
Selective coding. We pick one or two core themes and organizes the rest of the theory around them. Maybe mentorship breakdown becomes the “spine” of the story and the remaining categories are treated as upstream causes or downstream consequences.
Throughout the process, we also writes memos: informal notes about emerging patterns, uncertainties, and possible interpretations.
Why agent-assisted qualitative analysis is a good problem to work on
Qualitative analysis is genuinely hard for humans. It is tedious. It requires manually reading through the data, thinking about it, and deciding what is interesting. Coding a single interview transcript can take hours, plus additional time for finding themes across many interviews. It does not scale easily.
Qualitative analysis is also hard for AI to do because the “right” analysis depends heavily on context outside the corpus itself. In the PhD example above, one researcher might focus on advisor relationships and build a theory around mentorship breakdown, while another might focus on identity, isolation, or academic incentives. Neither analysis is necessarily wrong; they are emphasizing different aspects of the same interviews based on what they think is important and what question they are ultimately trying to answer. This may depend on the researcher’s background, the audience for the work, and more. Doing this well requires a kind of taste and judgment that is difficult to specify explicitly, which makes it a much harder AI-assistance problem than most tasks out there (e.g., with verifiable answers like “did the code compile?”).
Moreover, in qualitative analysis, the evaluation criteria themselves evolve throughout the workflow. Researchers often discover what matters by interacting with the data over many rounds of...