Designing a Better strcpy (2020)

GalaxySnail1 pts0 comments

Designing a Better strcpy

Like them or not, null-terminated strings are essential to C, and working with them is necessary in all but the most trivial programs. While C-style strings are a fundamental part of using the language, manipulating them is a common source of security bugs and lost performance. One of the most common operations is copying a string from one buffer to another, and there are a variety of string functions that claim to do this in C. Anecdotally, however, there is much confusion about what they actually do, and many people desire a string copying function with the following properties:

The function should accept a null-terminated source string, a destination buffer, and an integer representing the size of the destination buffer.

Upon return the function should ensure that the destination buffer points to a null-terminated string containing a prefix of the source string when possible (specifically, when the destination buffer has a non-zero size) to avoid issues in the future with unterminated strings. (While string truncation has its own issues, it is often a fairly reasonable fallback.)

The function should indicate how many characters it copied from the source, as well as indicate if an overflow occurred. (This allows for dealing with the overflow, if desired.)

The function should be efficient, and it should not read or write memory that it does not have to. These go partially hand-in-hand: the function should run in a single pass, not write to the destination buffer past the NUL byte it places, or read characters from the source string once it’s determined that it has filled the destination buffer. Ideally, the implementation would be vectorizable (relaxing some of the previous constraints slightly to within platform alignment guarantees).

The function should be standardized, so that it may be used portably across systems. Conformance to ISO C or POSIX.1 are generally the most desirable.

That is, what is often necessary is the function below, which we’ll call strxcpy:

char *strxcpy(char *restrict dst, const char *restrict src, size_t len) {<br>if (!len) {<br>return NULL;

while (--len && (*dst++ = *src++))

if (!len) {<br>*dst++ = '\0';<br>return *src ? NULL : dst;<br>} else {<br>return dst;

Other than standardization, this function will copy the smaller of strlen(src) or len - 1 bytes from src to dst and cap the copy with a NUL character. In the case where src fits in dst, it will return a pointer past the NUL byte it placed; otherwise it returns NULL to indicate a truncation. While current compilers seem to have trouble with its control flow, it should also be fairly straightforwards to vectorize, as the core loop is somewhat similar to a combination of strncpy and strlen.

With guidance to look back to, let’s take a look at a variety of copying routines and see if they can help us.

Note

To head off the usual concerns, we’ll assume that we must use C, and that we will be eschewing the various length-prefixed or aggregate string constructions available as third-party libraries. While using a different language can solve many of the issues in C besides the one mentioned here; it’s not always desirable or even possible to utilize them. In addition to the usual drawbacks to using third-party libraries, replacing null-terminated strings often causes added syntactical overhead and incompatibilities with other code that has been designed to work with them.

Some commonly used string copying routines

strcpy

Summary

Signature

#include

char *strcpy(char *restrict dst, const char *restrict src);

Standardization

strcpy conforms to ISO C90.

Notes

The standard strcpy function, which copies characters from src to dst, up to and including the first NUL byte encountered. If dst is smaller than or aliases src, then the behavior of the program is undefined. dst is returned.

strcpy certainly fulfills requirement 2 and parts of 4: it will always write out a null-terminated string and it’ll do so quickly. However, it cannot perform bounds checks at all, so we can only use it if we know our source buffer is smaller than our destination buffer–it fails requirement 1. Plus it doesn’t tell us how many characters it wrote, either–that’s requirement 3. It’s been part of C forever, so it does meet requirement 5.

strncpy

Summary

Signature

#include

char *strncpy(char *restrict dst, const char *restrict src, size_t len);

Standardization

strncpy conforms to ISO C90.

Notes

strncpy copies up to len characters from src to dst. If src is shorter than len, then dst is NUL-padded to len characters. dst is returned.

strncpy takes the parameters we want, so it satisfies requirement 1; even in the face of an arbitrary source string it won’t exhibit undefined behavior, provided that we supply it with the correct destination buffer length. However, if the source is longer that the destination, the buffer will not be null-terminated, and if it is shorter strncpy will continue writing NUL bytes to the destination...

string buffer function destination null char

Related Articles