Data Science Weekly – Issue 656

Data Science Weekly - Issue 656

Data Science Weekly Newsletter

SubscribeSign in

Data Science Weekly - Issue 656 Curated news, articles and jobs related to Data Science, AI, & Machine Learning Data Science Weekly Jun 18, 2026

Issue #656 June 18, 2026

Hello! Once a week, we write this email to share the links we thought were worth sharing in the Data Science, ML, AI, Data Visualization, and ML/Data Engineering worlds.

And now…let’s dive into some interesting links from this week.

Editor's Picks

Free SQL→ER diagram tool, runs in the browser, nothing uploaded Paste a SQL schema (CREATE TABLE statements) → get a clean, interactive ER diagram. Open source and 100% local — it runs entirely in your browser, so your schema never leaves your machine. No server, no signup, no upload…

Preschoolers search semantic networks in a broader and more variable way than adults: Implications for hypothesis generation We find that adults show greater dependencies between sequential guesses than preschoolers, and generate a less diverse set of options. These findings may support the idea that development can be viewed as analogous to simulated annealing strategies in machine learning that start “hot” (in early childhood), generating wider and more variable searches, and eventually cool (in adulthood) to generate narrower searches…

New CRAN Packages: signal or noise? CRAN continues to be the most accessible repository for statistical knowledge on the planet, and the number of new packages being accepted by CRAN is growing faster than ever. But, is the R community really benefiting from this new growth?…

What’s on your mind

This Week’s Poll:

Last Week’s Poll:

Data Science Articles & Videos

The 90-year-old idea behind JEPA models: Canonical Correlation Analysis (CCA) Harold Hotelling’s 1936 Canonical Correlation Analysis (CCA) [modern terminology, “CCA is used to find a common signal among two large matrices”] forms the theoretical and intuitive foundation for modern embedding prediction techniques, including JEPA models…

2026 Data Science Tech Stack at your Job [Reddit] What is your current tech stack at your job?

A Beginner’s Guide to Robotics Hardware Building an open-source robot begins in a way that is familiar to assembling IKEA furniture…The similarity ends once the robot is powered on. A bookshelf is designed to stay exactly as assembled, whereas a robot has to move while remaining correct about its own position and the state of its surroundings…When designing and building robots, this need for correctness both numerically and temporally is a key consideration…The rest of this post looks at how that difference manifests, using a common framing in robotics that divides the hardware into three parts: the movement, the body, and the sensor…

PyData London 2026 Talks All the talks from the PyData London 2026 are now available…

Data Engineering Acquisitions (2022-2026) Consolidation in the Data Engineering market is happening quickly. Tools from the Modern Data Stack get unified into bigger Data Platforms. This note highlights the latest acquisitions across data engineering. It serves as an overview of the latest consolidations. Find attached the acquisition overview from 2022 to today…

The software industry: annealing, but wrong In recent months I’ve heard of several teams with an interesting policy: each pull request should be no more than a few files, and no more than a certain number of lines (say 500). And do just one thing and do it well. And be easy for a human to review. And be fully tested by the test suite…And often, the results are good. Sure, splitting a single 6000-line feature or fix into twelve 500-line PRs is more work, but each of those PRs is surely easier to review. And you can git bisect them when there’s a bug! And maybe revert the individual change that broke something. ...and also cause 12x as many context switches for your reviewers as they review each one sequentially. But that’s just the cost of software quality! Right?…

AI and Survey Sampling Problems My previous post discussed the performance of the artificial intelligence (AI) interface Gemini on undergraduate statistics problems. Now let’s look at how Gemini answers some of the problems in my sampling textbook (Lohr, 2022), and talk about how Gemini could help students learn sampling…

Test Doubles Taxonomy for R: Dummy, Stub, Spy, Mock, Fake You might call them all “mock”…Mock the database. Mock the API. Mock the function. The word becomes a catch-all for any test double, any object you substitute for a real dependency in a test. Lumping them together makes it harder to choose the right tool, and the wrong choice leads to brittle, misleading tests. There are five distinct types, each with a specific job. Knowing which is which is how you stop writing tests that do the wrong thing…

Fractional Brownian Motion A Brownian Motion (BM), without the “fractional” part, is a motion where the position of a given object...

Data Science Weekly – Issue 656

Related Articles

Apple WWDC 2026 Livestream

Claude Fable 5

US Government directive to suspend access to Fable 5 and Mythos 5

German ruling declares Google liable for false answers in AI Overviews

Britain Became as Poor as Mississippi