The importance of free software to science [LWN.net]
LWN<br>.net<br>News from the source
Content Weekly Edition<br>Archives<br>Search<br>Kernel<br>Security<br>Events calendar<br>Unread comments
LWN FAQ<br>Write for us
Edition Return to the Front page
User:<br>Password: |
Log in /<br>Subscribe /<br>Register
The importance of free software to science
Benefits for LWN subscribers
The primary benefit from subscribing to LWN<br>is helping to keep us publishing, but, beyond that, subscribers get<br>immediate access to all site content and access to a number of extra<br>site features. Please sign up today!
June 4, 2025
This article was contributed by Lee Phillips
Free software plays a critical role in science, both in research and in<br>disseminating it. Aspects of software freedom are directly relevant to<br>simulation, analysis, document preparation and preservation, security,<br>reproducibility, and usability. Free software brings practical and specific<br>advantages, beyond just its ideological roots, to science, while<br>proprietary software comes with equally specific risks. As a practicing<br>scientist, I would like to help others—scientists or not—see the benefits<br>from free software in science.
Although there is an implicit philosophical stance here—that<br>reproducibility and openness in science are desirable, for instance—it is<br>simply a fact that a working scientist will use the best tools for the job,<br>even if those might not strictly conform to the laudable goals of the<br>free-software movement. It turns out that free software, by virtue of its<br>freedom, is often the best tool for the job.
Reproducing results
Scientific progress depends, at its core, on<br>reproducibility. Traditionally, this referred to the results of<br>experiments: it should be possible to attempt their replication by<br>following the procedures described in papers. In the case of a failure to<br>replicate the results, there should be enough information in the paper to<br>make that<br>finding meaningful.
The use of computers in science adds some extra dimensions to this<br>concept. If the conclusions depend on some complex data massaging using a<br>computer program, another researcher should be able to run the same program<br>on the original or new data. Simulations should be reproducible by running<br>the identical simulation code. In both cases this implies access to, and the<br>right to distribute, the relevant source code. A mere description of the<br>algorithms used, or a mention of the name of a commercial software product,<br>is not good enough to satisfy the demands of a meaningful attempt at<br>replication.
The source code alone is sometimes not enough. Since the details of the<br>results of a calculation can depend on the compiler, the entire chain from<br>source to machine code needs to be free to ensure reproducibility. This<br>condition is automatically met for languages like Julia, Python, and R, whose interpreters and<br>compilers are free software. For C, C++, and Fortran, the other currently<br>popular languages for simulation and analysis, this is only sometimes the<br>case. To get the best performance from Fortran simulations, for example,<br>scientists often use commercial compilers provided by chip manufacturers.
Document preparation and<br>preservation
The forward march of science is recorded in papers which are collected on<br>preprint servers (such as<br>arXiv), on the home pages of<br>scientists, and published in journals. It's obviously bad for science if<br>future generations can't read these papers, or if a researcher can no<br>longer open a manuscript after upgrading their word-processing<br>software. Fortunately, the<br>future readability of published papers is enabled by the adoption, by<br>journals and preprint servers, of PDF as the universal standard format for<br>the distribution of published work. This has been the case even with journals<br>that request Microsoft Word files for manuscript submission.
PDF files are based on an open, versioned standard and will be readable<br>into the foreseeable future with all of the formatting details<br>preserved. This is essential in science, where communication is not merely<br>through words but depends on figures, captions, typography, tables, and<br>equations. Outside the world of scientific papers, HTML is by far the<br>dominant markup language used for online communication. It has advantages<br>over PDF in that simple documents take less bandwidth, HTML is more easily<br>machine-readable and human-editable, and by default text flows to fit the<br>reader's viewport. But this last advantage is an example of why HTML is not<br>ideal for scientific communication: its flexibility means that documents<br>can appear differently on different devices.
The final rendering of a web document is the result of interpretation of<br>HTML and CSS by the browser. The display of mathematics typically depends<br>on evolving JavaScript libraries, as well, so the author does not know<br>whether the reader is seeing what was intended. The "P" in PDF stands for<br>"portable": every reader sees the same thing, on every device, using the<br>same fonts, which should...