Data Science Weekly – Issue 652

Data Science Weekly - Issue 652

Data Science Weekly Newsletter

SubscribeSign in

Data Science Weekly - Issue 652 Curated news, articles and jobs related to Data Science, AI, & Machine Learning Data Science Weekly May 21, 2026

Issue #652 May 21, 2026

Hello! Once a week, we write this email to share the links we thought were worth sharing in the Data Science, ML, AI, Data Visualization, and ML/Data Engineering worlds.

And now…let’s dive into some interesting links from this week.

Editor's Picks

What’s going on in computational neuroscience nowadays? (part 1) A month ago I came back from Cosyne, the annual Computational and Systems Neuroscience conference…The days are a haze of tutorials, talks, poster sessions, and workshops, usually appended by dinners and drinks past midnight…I seem to have a hard time writing one-off pieces, so I’m leaning into that and writing this as a series. I’ll only be able to write a very narrow perspective of Cosyne, of course, but most of the talks are on YouTube on the official channel if you want to see for yourself. (That also means this will be more personal thoughts than report)…

Is logistic regression regression? I came across a post recently by a machine learning engineer who made the bold claim that logistic regression is the worst name for an algorithm ever, or something along those lines…Many statisticians of the more old-school type seemed to disagree. This led me to think a bit more deeply about the subject. I’ve already written several posts on bad terminology in statistics (see confidence level, line of best fit, r squared) so I might have been expected to agree with the machine learning view, but in this case I agree with the statisticians, and I would like to explain why…

What Every Experimenter Must Know About Randomization Randomized controlled experiments offer gold-standard insight into cause and effect. The knowledge that informs our most important decisions. Unfortunately, randomization in such experiments is often botched. Randomization errors silently invalidate the interpretation of experimental results, turning a fruitful quest for knowledge into a waste of time and money, or, worse, a wellspring of misinformation. Fortunately, these fatal errors are easy to spot and fix. So whether you’re a webmaster using A/B testing to increase engagement, a medical researcher evaluating vaccines, a factory manager exploring productivity improvements, or a scientist seeking the laws that govern nature or human affairs, read on…

What’s on your mind

This Week’s Poll:

Last Week’s Poll:

Data Science Articles & Videos

Converting testthat Tests to testit Back in 2013, I wrote about testing R packages when I first released testit. Thirteen years later, I still believe that unit testing should be nothing more than “tell me if something unexpected happened.” Recently I converted a large testthat test suite to testit, and I thought I’d share a practical guide for anyone considering the same move…

After 5 years in data science, I’m starting to realize most “insights” we deliver are completely ignored. Is this normal? [Reddit] I’ve been in data science roles (both analytics and ML) for about 5 years now across a couple of companies. Lately I’ve been feeling a bit burned out because I keep seeing the same pattern…We spend weeks cleaning data, building dashboards, running statistical analysis, or training models… and then the stakeholders either: Say “thanks” and never use it

Cherry-pick the numbers that support their existing opinion

Or just completely ignore the findings and go with gut feel anyway

The worst part is when leadership asks for a “data-driven decision” but they’ve already decided what they want to do…Am I alone in this? Or is this just the reality of data science in most companies?…

Tagging my blog posts with BERTopic and LLMs I recently added tags to my blog using BERTopic and a mix of LLMs. You can see the tags in the sidebar to the right (or in the footer on mobile). I’ve done this before in 2023, with GGUF Mistral using llama-cpp, but never finished the project. Now, because the models have been getting so good, and my project was small, relatively well-defined, and easy to evaluate, the project took me about 6-10 hours over a month, using BERTopic, Claude Code, and Pi with Deepseek…

What data science is actually about in the age of AI I reflect on the evolving role of data scientists in the age of AI and LLMs. I argue that our core mission remains rigorous measurement, not full-stack development. While AI tools make building easier, the real value comes from defining and evaluating what truly matters. I share why measurement should be led by those closest to the problem and how data scientists can best contribute. Are we losing sight of what makes data science essential in the rush to build with AI?…

Transformer From Scratch I’ve wanted to dive deeper into the fundamentals of AI for a while now - it feels a little bit...

Data Science Weekly – Issue 652

Related Articles

Amazon, Facebook, FBI have access to a private intelligence-sharing network

SpaceX not the behemoth everyone thought

Elevated error rates on requests to multiple models

Donald Trump and sons to be 'forever' exempt from tax audits

PopuLoRA: Co-Evolving LLM Populations for Reasoning Self- Play