'AI' Could Lead to a Rise in Research Slop

NomNew2 pts0 comments

How ‘AI’ Could Lead to a Rise in Research Slop

Nominal News

SubscribeSign in

How ‘AI’ Could Lead to a Rise in Research Slop<br>GenAI may make statistical abuse much easier to implement.

Nominal News<br>May 18, 2026

Share

Nominal News is an economics newsletter written by a PhD Economist that translates the latest economic research into clear, policy‑relevant insights on current issues. Join 4,000 readers to stay-up-to-date with Nominal News directly in your inbox:<br>Subscribe to Our Newsletter

At Nominal News, we often talk about economics research that establishes causality. For example, how do immigrants impact innovation or how do free trade agreements impact people’s incomes and life expectancy. This causality is established using statistical methods, specifically, linear regression analysis.<br>Linear regression is an extremely powerful tool – nearly every existing economic policy can be linked to a research paper that uses linear regression to evaluate it. However, this tool can also be very easily abused to make it seem like there is ‘causality’ when there isn’t. The rise of genAI can make it much easier to manipulate data and present false research.

Photo by Deng Xiang on Unsplash<br>Linear Regression and Statistical Significance

Linear regression is a statistical method that tells us about the relationship between two variables. For example, how does educational attainment impact income; how do vaccines impact survival; how does adding a one road lane change congestion.<br>To properly conduct linear regression analysis, a researcher should follow the following steps:<br>Stipulate a hypothesis – e.g. a vaccine prevents the disease;

Establish the “null” hypothesis – e.g. the vaccine does not prevent the disease;

Collect data;

Undertake a linear regression analysis;

Conclude by rejecting or failing to reject the null.

The final step is what determines whether we have an outcome that is ‘causal’ or not: since if we ‘reject the null’, that means there is evidence for the alternative hypothesis – i.e. there is evidence that the vaccine does prevent the disease.<br>How Do We Reject/Fail to Reject the Null<br>So how exactly do we decide to reject/fail to reject the null hypothesis? Initially, we start off under the assumption that the null hypothesis is true – in our example, the vaccine does not prevent the disease. Once we collect the data, we perform the linear regression analysis. This analysis tells us how statistically likely we were to collect the data we did IF the null hypothesis is true. The probability of observing the data we did assuming the null hypothesis is true is called a “p-value”.<br>In our example, suppose based on the data collected, many of the vaccine recipients did not get the disease, while many non-vaccinated individuals did get the disease. Then, if the null hypothesis were true – i.e. the vaccine does not prevent the disease - the probability of collecting such data would be very low, and we would get a low p-value.<br>The decision to reject/fail to reject the null hypothesis depends on this p-value. If the p-value is lower than 5%, we reject the null, if it is greater, we fail to reject. It is worth pointing out that there’s no mathematical reason for this 5% cutoff – it is simply a scientifically agreed upon consensus. A p-value of 5% tells us that, if the null were true, there is a 5% probability that we would have observed the data numbers that we did.<br>Since 5% is assumed to be a low, unlikely event, we ‘reject the null’, meaning that our starting null hypothesis is rejected. In the case of the vaccine example, we would reject that hypothesis that the vaccine does not prevent disease. When we reject the null, it is often referred to as a ‘statistically significant’ result. It is important to note, however, that around 5% of the time we will wrongly claim statistical significance. That is, 5% of the time, we will have collected data that will make it look like there is a ‘causal relationship’ (statistically significant) between the variables in question, but, in fact, there is no such relationship.<br>‘P-Hacking’<br>This 5% cutoff is often crucial for researchers, as researchers cannot claim that they found a result – for example, that the vaccine reduces disease - unless the p-value is lower than 5%. Naturally, this creates an incentive to get one’s results below 5% by potentially manipulating the data. Any such actions are colloquially referred to as ‘p-hacking ’.<br>The most obvious way to “p-hack” is to simply delete or change the data collected in such a way to get the p-value below 5%. This is a clear case of falsifying data. These types of ‘p-hacking’ have occurred, with a recent high-profile scandal at Harvard involving exactly this type of data manipulation. However, there are also more subtle ways to ‘p-hack’.<br>Iterating on the Collected Data

One way to ‘p-hack’ on data is to look at subsets of the collected data. For example, focusing only on certain demographic groups within the...

data null reject hypothesis vaccine disease

Related Articles