Research Taste After Implementation Becomes Cheap

Research Taste After Implementation Becomes Cheap | Hanbo Xie

Back to Blog

English<br>中文

The Strange Feeling of AI-Era Research

A lot of research feels faster now.

We can ask AI to write code, clean data, summarize papers, generate labels, draft prompts, produce figures, compare models, and turn scattered notes into something readable. These are not small things. Many parts of research used to be slow simply because they were tedious.

But I also find myself having a strange reaction to some AI-era research. The work can look very complete. There is a dataset, a pipeline, a benchmark, a visualization, a model comparison, maybe even an agent loop. Everything is there.

And yet something still feels unresolved.

Usually the problem is not that the authors did nothing. They often did a lot. The problem is that visible work does not always answer the deeper question. What exactly was solved? Why does it matter? What would have counted as a nontrivial solution? What shortcuts were ruled out?

AI does not only make research easier to do. It makes research easier to stage.

Implementation can make a project look complete before the question has become clear.

Three Layers of Scientific Work

I find it useful to separate research into three layers.

01<br>Value

What is worth asking, and why does it matter?

02<br>Evaluation

What would count as solving it, and what shortcuts are ruled out?

03<br>Implementation

How do we build the study, model, benchmark, or system?

The first is the value layer: what problem is worth asking, and why does it matter? Battleday and Gershman distinguish between the "easy problem" of AI for science, solving well-specified optimization problems, and the "hard problem" of formulating the problem itself [1]. A good research problem is not just doable. It should change how we understand something, or how the world can operate.

The second is the evaluation layer: what would count as solving the problem? What evidence would show that the answer is not a shortcut, artifact, or post-hoc story? This layer becomes especially important when the problem is under-specified. Understanding, explanation, mechanism, transfer, scientific discovery, and human-AI interaction do not come with natural loss functions.

The third is the implementation layer: how do we build the experiment, dataset, model, analysis, benchmark, or system?

All three layers matter. Implementation is not "just engineering." Without implementation, ideas remain empty. But the layers are not interchangeable. A strong implementation does not automatically answer an important question. A benchmark score does not automatically validate an evaluation. A polished artifact does not automatically imply conceptual progress.

What AI Makes Cheap

AI has transformed the third layer.

This is mostly good. It means we can try more ideas, build faster prototypes, analyze more data, and explore directions that would previously have been too costly. It lowers friction. It makes research less bottlenecked by boilerplate.

But it also changes how we should read research.

When implementation was expensive, visible effort often carried some signal. A large pipeline, a complex analysis, or a carefully assembled benchmark suggested real investment. Now that signal is weaker. AI makes it much easier to produce research-like artifacts: taxonomies, annotations, heatmaps, comparisons, summaries, and polished narratives.

That does not make these artifacts useless. It means we should ask more carefully what they show.

AI makes it easier to build. It does not automatically make it easier to know what is worth building.

As implementation gets cheaper, taste becomes the scarce resource.

When Value Claims Drift

One kind of drift happens at the value layer. A narrow result is attached to a much larger vision.

This often happens in areas like NeuroAI, human-AI comparison, and biologically inspired AI. A study may show that an artificial model resembles human behavior or neural responses in some setting. Another may build a small functional demo inspired by a cognitive or neural principle. These results can be interesting. Similarities can be surprising. Demos can be useful.

The question is how much weight the result should carry.

Showing that a model resembles the brain in one setting is not the same as explaining the brain. Showing that a model resembles human behavior is not the same as explaining cognition. Building a toy system inspired by a cognitive principle is not the same as showing a scalable design principle for AI.

The missing question is simple: what would make this matter beyond the demonstration itself?

If the goal is to inspire AI, what conditions would make the idea useful for AI systems? Does it improve robustness, efficiency, transfer, interpretability, or control? If the goal is to understand cognition, what phenomena does it organize? What alternatives does it rule out? What would the next few studies need to show?

A serious impact...

Research Taste After Implementation Becomes Cheap

Related Articles

(no title)

Is AI ruining our skills? Early results are in – and they're not good

The Anatomy of an AI-Native Org

ZCode – Harness for GLM-5.2

Apertus – Open Foundation Model for Sovereign AI