Is Logistic Regression Regression?

jprs1 pts0 comments

Is logistic regression regression?

-->

Blog

© 2026 Data Science ConfidentialAll rights reserved

Links

My homepage

-->

R Bloggers

Main | About |

Categories | Feed

Is logistic regression regression?

by Richard in

predictive-models

14 May 2026<br>I came across a post recently by a machine learning engineer who made the bold claim that logistic regression is the worst name for an algorithm ever, or something along those lines1. Many statisticians of the more old-school type seemed to disagree. This led me to think a bit more deeply about the subject. I’ve already written several posts on bad terminology in statistics (see confidence level, line of best fit, r squared) so I might have been expected to agree with the machine learning view, but in this case I agree with the statisticians, and I would like to explain why.

What data scientists think regression is

In data science classes, students are taught that there are two kinds of predictive modelling. In both cases, the aim is to predict a response $Y$ given a vector of features $X$. If $Y$ is real-valued (numeric in R terminology) then it’s a regression problem. If $Y$ is categorical then it’s a classification problem. I’m not sure where this terminology originated, but it’s certainly been propogated very widely by Hastie and Tibshirani’s classic The Elements of Statistical Learning.

In logistic regression, your data consists of some feature values $X$ and a response $Y \in \lbrace 0, 1 \rbrace$. In this case, the response is definitely categorical, so someone trained in data science would indeed call this a classification problem. But if you look more closely at the output produced by logistic regression, its predicted values are numbers, namely the probability of each data point being in the class labelled $1$. You need to do something to these numbers (for example, use a cutoff) in order to get a predicted class.

For example, in R:

set.seed(100)<br>N 100<br>a -1<br>b 1<br>x 2 * rnorm(N)

# simulated binary data<br>y rbinom(N, 1, 1/(1 + exp(-a -b * x)))

# plot observed values in grey<br>plot(x, y, pch=19, xlab="x", ylab="y",<br>col=rgb(0, 0, 0, 0.3), las=1)

# fit logistic regression<br>model glm(y ~ x, family="binomial")

# plot predicted values in red<br>points(x,<br>predict(model, data.frame(x=x),<br>type="response"),<br>col=rgb(1, 0, 0, 0.3),<br>pch=19)

In fact, it’s quite hard to think of a machine learning algorithm which directly predicts class membership rather than some sort of measure of how strongly a data point is a member of a class. Even Naive Bayes is making some sort of attempt to predict the probability of class membership. The simplest algorithm which directly predicts the class instead of the probability of class membership is the 1-nearest neighbour algorithm. (But if you used a larger number of neighbours, say 20, you would get some sort of estimate of how confident you were in your prediction.)

What statisticians think regression is

The term regression comes from Galton’s idea of regression to the mean (which I have written about here). Originally this was the observation that tall parents tend to have children who are shorter than them, and vice versa. The heights of children seem to regress towards the mean of the whole population.

More generally, the values of the response $Y$ corresponding to some fixed value of the features $x_0$ will follow some probability distribution. The mean of this distribution is $E[Y \vert x_0]$. The observed values of $Y$ will cluster around this mean. If you repeatedly draw values of $Y$, a large value will tend to be followed by a smaller value, and vice-versa. Thus, $E[Y \vert X]$ will tend to be smaller than $Y$ if $Y$ is unusually large, and larger than $Y$ if $Y$ is unusually small2. You can see this if you use linear regression to predict $Y$ given $X$, as in the following example.

set.seed(100)<br>N 500

x rnorm(N)<br>y 0.4 * x + 0.8 * rnorm(N)<br>plot(x, y)<br>abline(coef(lm(y~x)), col="red")

(Note how the slope of the regression line is shallower than the “slope” which the eye perceives in the cloud of data points, which is the principal axis.)

But some algorithms don’t give you any regression effect. For example, an overfitted decision tree (a.k.a 1-NN regressor) will not show any regression to the mean, as in the following example. Note that the blue line does not under- or over-predict for the extreme values of $x$.

x c(1:9)<br>y c(-10, seq(-1,1, length=7), 10)<br>pred_nn function(xx) y[which.min(abs(xx - x))[1]]

plot(x, y)<br>abline(coef(lm(y~x)), col="red")<br>xx seq(1, 9, length=1000)<br>lines(xx, sapply(xx, pred_nn), type="s", lty=2, col="blue")

In this case, you have an algorithm which is predicting a numerical value, so data scientists would call it a regression, but it’s not actually exhibiting any regression. How annoying!

What regression actually is

Although it’s too late to rewrite the textbooks, maybe it could be argued that regression and classification should have been defined in the following way. If a...

regression data values logistic class algorithm

Related Articles