C array types are weird; and related topics
C array types are weird
In this article I’ll explain what I find weird about them, what I’d do differently, and ramble on a few related things.
Technically speaking, an array type T[n] (for some n) is distinct from a pointer type T *. A value of type T[n] represents a contiguous sequence of T values in memory, n long.
But you can’t actually refer to values of type T[n]. Any expression that would be of that type is immediately converted to a pointer, type T *, namely a pointer to the first element.
Since the array indexing operator arr[ix] actually operates on pointers, acting like *(arr + ix), you can basically treat arrays like pointers.
An important instance where this doesn’t happen is in sizeof arr, which returns sizeof(T) × n.
int arr[3] = {10, 20, 30};<br>int *arr_ptr = arr;<br>size_t arr_size = sizeof(arr);<br>size_t ptr_size = sizeof(arr_ptr);<br>// These may (and likely will) be different<br>Additionally, in function signatures, any array type you give to an argument is actually interpreted as a pointer instead. The n denoting the size is completely discarded. That means that, as an exception to the exception, sizeof arr in a function with an argument T arr[n] will not evaluate to sizeof(T) × n.
size_t foo(char buf[6]) {<br>return sizeof(buf);
char msg[6] = "!! ??";<br>size_t msg_size = sizeof(msg);<br>size_t msg_size_in_fn = foo(msg);<br>// These may (and likely will) be different<br>Note that you can write char buf[static 8] to “enforce” the length, but this just makes it undefined behaviour if you pass a pointer to a shorter array. Similar to restrict, all it does is aid the compiler in optimisation.
Instead, you can use a pointer to the array as the argument. Instead of decaying to T *, a pointer to the first element, you can take a reference at the call site to get T (*)[n]. These are effectively the same thing at run-time, but this preserves the length information. It is inconvenient and confusing to write, though.
size_t foo(char (*buf)[6]) {<br>return sizeof(*buf);
char msg[6] = "?? !!";<br>size_t msg_size = sizeof(msg);<br>size_t msg_size_in_fn = foo(&msg);<br>// These will be the same<br>Aside: Functions
Interestingly, there’s a second type in C that acts very similar, but isn't nearly as confusing. That type is functions.
Like arrays, function values immediately coerce to function pointers. Unlike arrays, however, dereferencing a variable that refers to a function, e.g. *fn, does allow you to call that function in the same way as the plain symbol would.
void foo() {}<br>(*foo)();<br>foo();<br>While writing &arr for an array does actually give you a pointer-to-array type T (*)[n], &fn is completely equivalent to fn. That’s because an array arr doesn’t decay to &arr, it decays to &arr[0], whereas a function fn does automatically convert to exactly &fn.
Note that for both arrays and functions, they don’t decay when given as arguments to the & operator, which is why &arr isn’t a pointer-to-pointer.
Additionally, writing T fn() or T (*fn)() in function argument lists is also the same—the second gets automatically corrected to the first, very much like array types being automatically corrected to pointer types.
Arrays by value
Fundamentally, an array type is similar to a struct with all members being of the same type. But arrays are often used in a way that structs aren’t. We rarely get the address of the second member of a struct. This is probably because an array with its head shifted remains an array, just of a different size. Since we often ignore, or are ignorant of, the size of an array, this is a natural way to deal with arrays.
I think it would’ve been much easier to mentally model the situation if C had employed a strict separation of arrays and pointers.
Arrays should act just like structs. Passing a char[5] to a function should pass the actual five values in the array. It should be like having five char arguments to the function.
int compute(int arr[3]) {<br>arr[2] += arr[1];<br>arr[1] *= arr[0];<br>arr[0] *= (arr[1] + arr[2]);<br>return arr[0] - arr[2];
int arr[3] = {10, 20, 30};<br>int result = compute(arr);<br>// arr is not modified<br>A pointer to an array would therefore involve only one level of indirection. If you wanted to treat an array like a pointer, you’d have to manually write &arr[0] to get a pointer to the first element of arr.
void toggle(bool *flag) {<br>*flag = !*flag;
bool arr[2] = {true, true};<br>toggle(&arr[1]);<br>The most obvious immediate benefit is that this makes the language less confusing to learn. It’s very easy to be confused, as a beginner, by the fact that writing to an array inside a function does change the array outside the function, but the same isn’t true for structs.
Normally, the presence of references makes this delightfully explicit and easy to understand in C. In fact, C is, in this respect, much simpler and easier to understand than languages like Python, where objects are pointers by default, and C++, where an argument may be passed by reference...