What science can tell us about C and C++'s security · Alex Gaynor
Alex Gaynor
Hi, I'm Alex. I'm a software resilience engineer. I care about building systems that work. I've worked for the government, in the private sector, and on open source. I'm based in Washington, DC.
© 2026. All rights reserved.
What science can tell us about C and C++'s security
Wed, May 27, 2020<br>There are not a lot of very strong empirical results in the field of<br>programming languages. This is probably<br>because there’s a huge amount of variables to control for, and most of the<br>subjects available to researchers are CS undergraduates. However, I have<br>recently found a result replicated across numerous codebases, which as far as I<br>can tell makes it one of the most robust findings in the field:
If you have a very large (millions of lines of code) codebase, written in a<br>memory-unsafe programming language (such as C or C++), you can expect<br>at least 65% of your security vulnerabilities to be caused by memory<br>unsafety.
This result has been reproduced across:
Android (cite): “Our data shows that issues like use-after-free, double-free, and heap buffer overflows generally constitute more than 65% of High & Critical security bugs in Chrome and Android.”
Android’s bluetooth and media components (cite): “Use-after-free (UAF), integer overflows, and out of bounds (OOB) reads/writes comprise 90% of vulnerabilities with OOB being the most common.”
iOS and macOS (cite): “Across the entirety of iOS 12 Apple has fixed 261 CVEs, 173 of which were memory unsafety. That’s 66.3% of all vulnerabilities.” and “Across the entirety of Mojave Apple has fixed 298 CVEs, 213 of which were memory unsafety. That’s 71.5% of all vulnerabilities.”
Chrome (cite): “The Chromium project finds that around 70% of our serious security bugs are memory safety problems.”
Microsoft (cite): “~70% of the vulnerabilities Microsoft assigns a CVE each year continue to be memory safety issues”
Firefox’s CSS subsystem (cite): “If we’d had a time machine and could have written this component in Rust from the start, 51 (73.9%) of these bugs would not have been possible.”
Ubuntu’s Linux kernel (cite): “65% of CVEs behind the last six months of Ubuntu security updates to the Linux kernel have been memory unsafety.”
And these numbers are in line with what we’ve seen in<br>0days that have been discovered being exploited.
This observation has been reproduced across numerous very large code bases,<br>built by different companies, started at different points in time, and using<br>different development methodologies. I’m not aware of any counter-examples. The<br>one thing they have in common is being written in a memory-unsafe programming<br>language: C or C++.
Based on this evidence, I’m prepared to conclude that using memory-unsafe<br>programming languages is bad for security. This would be an exciting result!<br>Empirically demonstrated technical interventions to improve software are rare.<br>And memory-unsafety vulnerabilities are one of the only kind that we know how<br>to completely eliminate, by choosing memory-safe languages. However, it’s<br>critical we approach this question as rational empiricists, and see if the<br>evidence really merits the conclusion that memory-unsafe programming languages<br>are bad for security.
Let’s consider the Venn diagram of vulnerabilities:
There are vulnerabilities that can exist only in memory-unsafe languages<br>(e.g. buffer overflows or use-after-frees)
There are vulnerabilities that can exist in any programming language (e.g.<br>SQL injection or XSS)
There are vulnerabilities that can only exist in memory-safe languages (e.g.<br>use of eval on untrusted inputs; eval tends to only exist in very<br>high-level languages, which are all memory-safe)
So the first set contains at least 65% of the vulnerabilities in these types of<br>codebases, and logically the second set must contain 35% of the<br>vulnerabilities. So if we change programming language to something memory-safe,<br>we get rid of at least 65% of our vulnerabilities. But does the magnitude of the<br>other sets change?
I posit that the second set stays the same size: there’s no reason or evidence<br>to think that porting C++ to a memory-safe language results in additional SQL<br>injection.
Our third set is vulnerabilities that are specific to memory-safe languages.<br>Actual use of eval in production code is incredibly rare in my experience,<br>however its cousin “unsafe deserialization” does occur in the real world.<br>To investigate its frequency, I looked into Java’s unsafe deserialization on<br>Android. Based on research I reviewed,<br>Android as a whole appears to have had maybe a dozen of these. Basically every<br>month it has more memory-unsafety issues than it’s had vulnerabilities of this<br>class all time. So I believe this class to be orders of magnitude...