Please Switch to Python. Or R. Or Anything. Just Not Stata, SAS, SPSS, or Matlab

heterodoxjedi4 pts0 comments

Please Switch to Python - by Abigail Haddad

The Present of Coding

SubscribeSign in

Please Switch to Python<br>Or R. Or Anything. Just Not Stata, SAS, SPSS, or MATLAB.<br>Abigail Haddad<br>Jan 27, 2026

40

Share

A few weeks ago, new federal workforce data came out from OPM. There’s a drag-and-drop tool for pre-built visualizations, but they also made over seven hundred monthly raw data files available for download. How? You click the download button. One at a time.1<br>This is fine if you only want a few files. But I wanted hundreds of them, for myself and to make available for other people with questions about this data. Downloading them manually means leaving no record of what you did, and doing it all over again if the data updates from Version 1 to Version 2. And if you make a mistake, are you going to realize it?<br>Thanks for reading The Present of Coding! Subscribe for free to receive new posts and support my work.

Subscribe

I didn’t pull it manually, but I still got the data. I’ll explain how in a moment. But first, some context.<br>Stata, SAS, SPSS, and MATLAB are proprietary tools for statistical and mathematical analysis. You write code in them, though they have point-and-click interfaces as well. They’ve been around for decades, they’re taught in graduate programs, and a lot of people use them for analytical work.<br>When I say you should switch to Python from those tools, I’m using Python as a stand-in for something broader: open-source, general-purpose programming languages. R. JavaScript, whatever. The point here isn’t that Python is the One True Language2. You might be able to do everything you need in R, or you might, as is increasingly common, combine several languages.<br>But Python is the biggest and most general-purpose of these languages and the one I use most, so that’s what I’ll focus on. The argument is really: join the world where tools work together, you can build anything you want, and you can share what you built. AI coding assistance is making it much easier: the switching costs fell, so there’s even less reason to stay where you are than there was a year ago.<br>The OPM data

The data got released on a Thursday. By the time I could look at it, I’d already done my full-time job, run a meetup, and come home. I wanted to get this data up before the next morning, when I was hosting a call to share initial thoughts and go over what was available in the drag-and-drop tools vs. the raw data.<br>I also wanted to sleep.<br>Because of how the OPM site loads, to download the data automatically, I needed something that could control a web browser. There’s a Python library called Playwright that does this beautifully. Anything you can do manually in Chrome, it can do with a script. A couple hours later, I had the files I needed, uploaded to Hugging Face where they can be easily downloaded or worked with in-place. I included a Jupyter notebook on GitHub demonstrating how to pull from Hugging Face and analyze the data. Claude Code helped enormously with every step of this.<br>Besides just the data, what does this get me? A workflow that’s transparent: every step is shared on GitHub. It’s repeatable. Other people can run it. I can build on it: when new versions come out, I can tweak the code. If I wanted, I could set up a GitHub Action to check daily for updates automatically and upload them to Hugging Face.

This site of mine updates daily via GitHub Actions to pull new data from federal GitHub accounts.<br>Last week, a reporter flagged a possible issue in the data. I replicated it, built a notebook reproducing the issue, and sent a link to OPM. They can run it themselves, because Google Colab makes that free for anyone. And they can follow what I did, because they have employees who know Python.<br>This is the infrastructure that exists. If you’re using Python.<br>You can’t do that in Stata, SAS, SPSS, or MATLAB

Browser automation? Stata can’t. SAS can’t. SPSS can’t. MATLAB’s answer is “call Python from MATLAB.”<br>Upload the data to Hugging Face? You’re going to be stuck using the GUI to drag-and-drop it.<br>Create a notebook anyone can run for free online? None of these allow that.<br>Put it on GitHub Actions to run automatically? Python comes pre-installed on GitHub Actions and can be run for free. None of these proprietary tools have that.<br>The pattern: even where these tools have some capability, it requires calling Python, or expensive licenses, or both. You only get the tools someone built for you—and those are narrower, because you’re a smaller and more specialized group.<br>What else opens up

Once you’re in Python, what counts as “data” expands dramatically.<br>Data no longer has to be a CSV or an API call wrapped up in a nice bow. It can be PDFs you found by automating Google searches. It can be Excel files someone built intending them to be gone through by hand, but you want years of them, you want to process their different formats and tabs, and you want a pipeline that grabs new ones and alerts you if they don’t match what...

data python tools github matlab from

Related Articles