Linux Kills Strncpy

ingve1 pts0 comments

Linux Kills strncpy | Stephen Smith's Blog

Stephen Smith's Blog

Musings on Machine Learning…

Linux Kills strncpy

leave a comment "

Introduction

The C string library is compact, fast and efficient. However, if not used correctly and carefully, it leads to buffer overrun errors which either cause programs to crash or worse allow arbitrary code to execute. Hackers have found errors in the use of C string functions to be a goldmine in security weaknesses to exploit. The Linux kernel is written in C and uses these string functions and has been spending a lot of time fixing problems in the kernel’s string usage and improving the string library’s API.

The original root of many problems is the strcpy(dest, src); function which copies bytes from the src to the dest until a NULL terminator is reached. The problem being if the src string is larger than the destination buffer or the src string isn’t NULL terminated at all. In this case the dest buffer is overwritten and whatever is in memory afterwards is wiped out. If the buffer is on the stack, this allows hackers to overwrite the function’s return address and allows the hacker to set the address to something of the hacker’s choosing.

The original fix for this was strncpy(dest, src, n); where n is the size of the destination buffer, so copying stops when n is reached. This way memory isn’t overwritten. The problem is that the resulting string in dest isn’t NULL terminated. This then led to a lot of code of the form:

strncpy(dest, src, sizeof(dest));<br>dest[sizeof(dest)-1] = ‘\0’;

The problems this leads to are people forgetting to add the second statement, or getting the size wrong, like forgetting the -1.

strncpy also has a second feature, that it always copies n bytes, if the src is shorter than dest then it pads to n bytes with further NULL bytes. Most programmers don’t even know it does this and generally it is unnecessary work that is rarely required.

Removing strncpy Was a Lot of Work

It took six years and over 360 patches to clean all the instances of strnpy out of the Linux kernel. This was a fair amount of work, mostly because each instance had to be considered and the correct function used in its stead. This wasn’t just a simple matter of copy/paste. The new functions also have better error returns so what happened can be tested based on the return code.

A lot of the older memory and string libraries in the Linux kernel have Assembly Language versions for the various supported processors. I don’t see many for the new string functions, so perhaps there is a bit of room to produce even faster code with some clever Assembly Language.

What Replaces strncpy?

The new functions guarantees strings are strings and separate the string overflow protection from the buffer padding functionality of strncpy. This way it is clearer what is being done and improves efficiency by eliminating unnecessary code execution.

The most common replacement is strscpy(dest, src, n); which guarantees the dest buffer is a string, ie it sets the last byte to ‘\0’ if necessary. It does not do any buffer padding and will return E2BIG is the src string is truncated.

strscpy_pad() adds the buffer padding. The full list of related is functions is below. Note the mem* functions have always been for fixed sizes and treat NULL characters like any other character.

Use case Preferred function Why it is clearer Copy into a normal NUL-terminated destination stringstrscpy() Bounds the copy and guarantees termination when the destination size is nonzero.Copy into a NUL-terminated string and zero-fill the reststrscpy_pad() Preserves padding intentionally rather than accidentally inheriting it from strncpy() .Copy text into a fixed-width non-string memory fieldstrtomem_pad() Makes it clear that the destination is memory, not a C string.Copy a bounded value and explicitly pad unused spacememcpy_and_pad() Separates byte-copy semantics from string semantics.Copy a known number of bytesmemcpy() Signals that no string termination or padding behaviour is needed.

Summary

The C programming language along with its string library has been around since 1972. In that time nearly every operating system in use today is written in C along with compilers, system software, application software and games. Back in 1972, the goal was to be as fast and efficient as possible. Hackers weren’t a concern since computers weren’t connected. The C string library had a lot going for it, to be used as extensively as it has. In the following years, cracks in the API have appeared, and it has been surprisingly hard to adequately fix them, you patch one place and a problem appears somewhere else. Hopefully with these changes the Linux kernel is that much more secure, while maintaining its vaunted efficiency. This does show the importance of good API design and in spite of its problems, the C string library has lasted a long time. These tweaks seem to preserve the intent of the library while fixing the most obvious...

string dest strncpy buffer copy linux

Related Articles