You don't know XPT files

caerbannogwhite1 pts0 comments

You don't know XPT files — KoliStat

← All posts

...unless you belong to the magical world of clinical trials ! 1 2

Every clinical trial headed for the FDA produces a fresh pile of XPT files. It's nearly 40 years old, stores its numbers in IBM hexadecimal, and caps variable names at 8 characters. It is also not going anywhere, because regulators still mandate it.

Show them to me!

First of all, would you like to open an XPT file? Well, it just happened that I launched a web app that does exactly that (among many other wonderful things, of course), and it happens that this might be the blog of the website that tries to promote it, in some way.<br>If you're not interested in the app but you stick around, you might still learn a thing or two from this post.

The app is Bedevere Wise, named after the acute and science-fond knight of the Round Table.<br>Here is how it works: you take your XPT file, you drop it into the app, and voilà, magic!

Well, you can also try to drop many other kinds of files used in the clinical trial world, like CSV, SAS7BDAT, Stata and even Excel! Or you can let the app browse a folder for you.<br>And everything stays in your browser! Even though, for the most picky ones, a desktop version of the app is on its way to be released soon — perfect even for the most air-gapped, strictly-secure, highly-regulated environments!

In case you are curious to try, but you don't have an XPT file handy, here is a GitHub repo with a full clinical trial dataset and plenty of XPT files that you can download.

Or, if you just want a couple of tiny files to poke at, grab these unimpressive, synthetic, test samples:

xpt_test_num.xpt

xpt_test_mixed.xpt

So, what are these damn files?

I'd like to start by saying, "According to Wikipedia...", but unfortunately, there's no page on Wikipedia about the topic.<br>However, there is a page about the SAS Transport File Format (XPORT) Family in the Digital Preservation section of The Library of Congress of the United States. The page says:

The SAS Transport File Format is an openly documented specification maintained by SAS, a commercial company with a variety of software products for statistics and business analytics, including the application now known as SAS/STAT, which originated in the late 1960s as SAS (an acronym for Statistical Analysis System) at North Carolina State University. The transport format was originally developed in the late 1980s when the corporate entity was known as SAS Institute, Inc. and the software as SAS, to support data transfers between statistical software systems, especially between SAS applications running on different operating systems. SAS considers it non-proprietary. 3

Basically, it's a dinosaur of the computer age! An almost 40-year-old tabular data format, designed to store things that can be represented as tables, with headers and columns that are either text or numbers.

And given the age, it comes as no surprise that XPT files are on the list, since over their long life they simply became ubiquitous in the pharma/clinical trials world. Hence the reason to publicly preserve the format.

And the reason they became — and stay — so ubiquitous? Regulation. If you want to submit the data of a clinical trial to the FDA (and it's a similar story for Japan's PMDA), you don't really get to choose your file format: submission datasets have to be provided as SAS Transport Format Version 5. Yes, the .xpt we've been talking about. So every study headed for a regulatory submission produces a fresh pile of XPT files. And that pile isn't shrinking any time soon.

It shouldn't surprise you, then, that many other, more familiar file formats are in the Library of Congress too, like PDF, JSON or ZIP. The reason is quite simple: those formats have become essential in our everyday lives, not only for people in IT or software engineers, but for everyone. Even though most people don't know all the technical details and the specifics of the format, everyone relies on them, and there must be ways to know how to interact with those formats.<br>They became public protocols, like the ones we use to send internet packets or to open web pages.

And XPT files are no less than that! You — small pharma company — want to establish a formal communication with the FDA? Then, you speak their language and submit XPT files.

Wait, I thought it was all IEEE 754!

If you managed to stick around up to this point, you might either be a bit of a masochist, an LLM crawler, or simply interested in some more technical details. And well, from a software engineering perspective, there is at least one worth mentioning: numeric values in the XPT format are stored using the IBM float representation (yes, there is a page on Wikipedia about it). So, in order to be used by modern programming languages, they must be converted into the more convenient IEEE 754 floating point standard.<br>So no, even in 2026 it's not all IEEE 754. 4

If you speak Python, here is the code used by Pandas,...

files format file clinical like even

Related Articles