CMU 15-113: Parsing variable declarations

Variable declarations in C can be ugly.

    int *a[10][20];
    int (*(*b)(int))[10];
    int c(int (**c)());
    void (*signal(int sig, void (*func)(int)))(int);

But that's no reason to shy away from them. Once you know how to parse C's variable declarations, you'll never have to wonder "do I need one asterisk, or two?" ever again.

You'll see me use the phrase "just like ordinary math" a lot in this lecture; that's because C really is designed to be a lot like ordinary math, and the more you internalize that fact, the easier C programming will seem to you.

`*` is an operator

First things first. Just like ordinary math, C has binary operators, such as / and <<, that require two operands, one on each side. And just like ordinary math, C has a few unary operators, such as -, which take only one operand. For example, just as "1+1" adds 1 and 1 to yield 2, "-1" applies the unary - operator to 1, yielding the value −1.^*

You might not have realized this before, but the "address-of" and "dereference" operators, & and *, are unary operators too, just like -, and follow all the same rules. The only important difference is that while applying - to an int gives you back an int, applying & to an int gives you an int *.

`[]` is an operator

Those three operators (and also unary +) are all prefix unary operators. C also has four postfix operators: [], (), ., and ->. (The second two have to do with structs, and you can safely forget about them for now.)^* The () operator, just like ordinary math, is the "function call" operator. "f(x)" means "evaluate f and x, and then apply the function f to x."

Just like ordinary math, postfix operators bind tighter than prefix operators. "-f(x)" means "evaluate f(x), and then negate it." And then, since all the prefix and postfix operators behave the same way, "*f(x)" means "evaluate f(x), and then dereference it"; "&f[x]" means "take the xth element of f, and then yield its address," and so on.

Parsing expressions

Here are some exercises in parsing C expressions. There is a precedence table in the back of your copy of K&R, but you don't need to look at it for these exercises — they only involve two simple rules.

Unary operators always bind tighter than binary operators.
Postfix operators always bind tighter than prefix operators.

Explain in words what the following expressions do.

    *a[10]
    (*b)[10]
    *(*c)[10] + *d(*e) + **f
    -*g*-*h
    --i*-*j*-*k--

Now, for each variable, give one example of a type it could reasonably have, in order for the expression to type-check.^*

Digression: Common sense

We know that the declaration int *p; declares a pointer to an int, and int a[5]; declares an array. What do you expect the following lines to declare?

    int i, j;
    int* p;
    int a[5], b[7];
    int *q, *r;
    int *s, k;
    char* mstr, nstr;

Does the last line actually declare two pointers to char? What does this exercise imply about the "correct" position of the asterisk in pointer declarations and expressions?

Declaration reflects use

"Declaration reflects use" is the cornerstone of understanding declarations in C and C++. Java totally screws it up, which is one of many reasons you'll find a lot of C programmers who look down on Java.

Consider the following declarations:

    int *a[10];
    int (*b)[10];
    int *(*c)[10];

They should look familiar, because they're just the first three examples from the last section, with the primitive type-name int stuck on the front. All declarations in C follow this basic format: Something that looks like an expression, with a type-name stuck on the front.

Read the first declaration this way: "*a[10] is an int." Or, using our newfound ability to parse expressions: "If you take the somethingth element of a and dereference it, you get an int." In other words, the variable a is something with elements — i.e., an array. Those elements can be dereferenced to yield ints — i.e., the elements are pointers to int. Therefore, a is an array (with 10 elements) of pointers to int.

This method of reading declarations is also sometimes called "reading from the inside out."

Try the next one: If you dereference b, and then take the somethingth element of that, you get an int. So b must be a pointer to an array (of length 10) of int.

Function declarations

The fun part of C, of course, is declaring functions. To declare a function, you follow the "declaration mimics use" principle, except that each function parameter gets its own mini-declaration. If any of those parameters involve functions themselves, then you can recurse again, or you can just put a pair of empty parentheses and move on. It's up to you.

A function declaration whose parentheses do contain mini-declarations of each parameter is said to be a function prototype. So, "prototype" is not strictly a synonym for "declaration," but in this class you'd better not write any declarations that aren't also prototypes!^*

    int marypoppins(int *umbrella, char (*nursery)());

That line prototypes a function named "marypoppins" that takes two arguments and returns an int. The first argument is of a type that can be dereferenced to get an int, namely, pointer-to-int. The second argument, following the same inside-out rule, turns out to be a pointer to function returning char.

As you see, we can declare pointers to functions. However, C doesn't let us declare arrays of functions. (It makes sense. After all, how would you initialize one?) We can still declare arrays of pointers to functions, and pointers to pointers to functions, and so on. This can be useful — but you might not see how, until you take 15-212.

One more thing about function prototypes: You're allowed to give the function parameters names, like "umbrella" and "nursery", but it's also valid to leave those names out, in a function declaration. (Not in a function definition!) So the declaration below means the same thing as the original one:

    int marypoppins(int *, char (*)());

The compiler is smart enough to figure out where to insert place-holder names to make it parse correctly. The (*) in particular is a tipoff that something's missing there.

The ugly examples from the top of the page

    int *a[10][20];
    int (*(*b)(int))[10];
    int c(int (**c)());
    void (*signal(int sig, void (*func)(int)))(int);

The first one, reading from the inside out, is an array of 10 arrays of 20 pointers to int.

The second one, again reading from the inside out, is a pointer to function taking one int argument and returning a pointer to an array of 10 int.

The third one is a function taking one argument and returning int. That one argument is a pointer to pointer to function returning int, with nothing specified about its arguments.

The fourth is a function taking one argument of type int and a second argument of type void (*)(int) — that is, a pointer to function taking int and returning nothing — and returning a pointer of the same type as its second parameter. Unlike the first three, this is a real function prototype — you can find it in the <signal.h> library on your Unix system.

Exercise for the clever: K&R's `dcl`

In Chapter 5 of your K&R textbook, section 5.12, the authors present two programs: one that parses declarations from C into English, and one that does the reverse. If you want to learn C faster, try to solve the exercises 5-18, 5-19, and 5-20. Exercise 5-19 is probably the easiest. If you try the exercises, you can e-mail me (ajo@) with questions.

Footnotes

That's right — C doesn't have any way to explicitly specify a negative integer literal! When you write "-42", the C compiler will compute that value by applying the unary minus operator to the value 42. If you learned about signed integer overflow in one of your previous classes, then you should be able to see that this can cause great confusion with code that tries to specify the value −32768 (or −2147483648 on 32-bit systems) as an int! (And if that doesn't make sense to you, it's okay. I'm just footnoting it for those interested in the quirks of C.)
Java has the same syntactic quirk, but includes a special case: the literal 2147483648 is specially allowed as the operand of the unary - operator, and nowhere else.
Okay, really there are two more postfix operators and three more prefix operators. The increment and decrement operators ++ and -- can be used in either "fix," and the keyword sizeof is syntactically an operator, also, despite looking like a function. We'll see later that sizeof is very useful and important in C.
"Type-check" is a programming term that you should have been exposed to already. If an expression, function, or program type-checks, then all the types check out; it doesn't try to apply the "dereference" operator to an int, or evaluate 1 << 3.14, or take the address of 42. For example, "*a[10]" typechecks if a is an array of pointers. (Or if a is a pointer to pointer, or pointer to array, or array of arrays... but those are the smart-aleck answers.)
What if you want to specify that a function takes no arguments? You can't leave the parentheses empty, because that wouldn't be a prototype. So instead, you put the special keyword void inside the parentheses. That prototypes the function correctly. int uncle_albert(void);
The keyword void is also used in two other places in C. It is one of three keywords in C that are overloaded in that sense. (The other two are extern and static.) The first place: You can specify that a function returns no value by using void in place of the primitive type-name, as in "void foo(int);". The second place: C has a generic pointer type called "void *", pronounced "void pointer." (But there's no "void type," or "void arrays," or anything like that.) You won't need to understand void pointers for this course, but you will be using them all the time, anyway, once we get to malloc.

This page was last updated 24 March 2006
All original code, images and documentation on this page are in the public domain.