CMU 15-113: Lecture notes errata

These are errata (that is to say, corrections) to the lecture notes posted on Tim Hoffman's course page as of 30 November 2005. I've organized the errata basically chronologically, but you don't need to read this page from top to bottom to get something out of it. Skip around! Look at the code samples! Have fun!

The code samples in this errata are color-coded; red text means an error to be deleted, and blue text means a correction to be inserted. The accompanying text usually explains why the correction is needed.

Now, on to the errata!

30 August 2005

In "demo1.c":

    printf("Please enter an integer: ");
    fflush(stdout);

C's input is line-buffered, which means the program reads a line at a time, waiting for a newline ('\n') rather than processing input every time you touch a key on the keyboard. This is a good thing, since otherwise you, the programmer, would have to code up special cases for keys like "backspace" and "carriage return." With line-buffered input, all those weird control characters are handled invisibly by the operating system or by the C runtime system.

But C's output is line-buffered, too. That is, if you try to print something to standard output, or to a file, it may not actually appear there until you also print a newline to that same device (which is called a stream, in C/Unix lingo). Then the full line appears "all at once" on the output stream.

But what if we don't want a newline in our output? Then we need to use the fflush function to "flush out" all the data written so far onto the screen, or into a disk file, or wherever it's going.

A note: The fflush function, as I described it, works only on output streams; those are the only streams where there's anything to "flush out." In fact, fflush is only defined to work on output streams; if you try compiling fflush(stdin) you'll get an error message at best, and undefined behavior at worse.

Undefined behavior

"Undefined behavior" is the technical term, in C, for "anything might happen." Two canonical examples of undefined behavior are dereferencing a null pointer and trying to evaluate an uninitialized variable. The two canonical results of these actions are to defrost your refrigerator and to cause demons to fly out of the user's nose. Compare the DeathStation 9000.

"Thus a #include increases the size of your compilation object..." This is unfortunate wording, since "object code" is something else in the C/Unix world, and #including header files doesn't generally increase the amount of object code produced. But the sentiment is right: Don't #include headers you don't need.

Character encoding

It's worth knowing that C does not specify the ASCII encoding for characters — a good thing, in this age of Unicode, but a much better thing back in the '70s, when each computer manufacturer had its own character encoding. The result is that if you want to write real, portable C code, you can't rely on things like 'A' being 65, or ')' being 1 greater than '('.

C does enforce a few constraints on the character encoding, though. For example, the digits are contiguous and in order: '7' == '0' + 7. The same can't be said of the letters, unfortunately. (This is because in one very common encoding of the '60s through '90s, EBCDIC, there really was a gap between the letters 'I' and 'J'! C was specified to work around that historical quirk, and others like it.)

Integer sizes

Obviously, ints aren't always 32 bits — how could anyone write 64-bit code then? Likewise, char doesn't have to be eight bits (think Unicode!); in fact, none of the basic data types have any upper limit on their size. Again, there are a few constraints — but there's no need to get into them here. In this class, you may generally assume that int is "big enough," whether that be 16 bits, 32, or 1024.

The standard header <limits.h> provides a lot of macros, such as INT_MAX, which will yield the exact ranges of all the basic types. Thus we can write portably for any system:

    int i = 2147483647;
    int i = INT_MAX;

Plain char (yes, that's the technical term!) may be either signed or unsigned, depending on your compiler. If it matters, you can explicitly specify signed char or unsigned char, but in general you shouldn't be using chars to hold values that don't correspond to actual characters. As the lecture notes suggest, using char as a "small int" is possible, but terribly bad form.

Obviously, besides the four other possibilities mentioned in the notes, scanf might return EOF.

Compiling your programs

If you're using GCC for this course, you should always compile your code with

    gcc -O2 -W -Wall -ansi -pedantic *.c

All those parameters are cumbersome, but if you type them every time, you'll be amply rewarded — GCC knows a lot about C, and can warn you when you're about to do something stupid, such as evaluate an uninitialized variable or pass the wrong kind of argument to printf or scanf.

For the really assured, most Unix machines come with a program called lint or splint, which does clever code analysis to figure out a lot more of the sneaky errors. It can sometimes help with pinpointing memory leaks. Unfortunately, splint is an ultra-liberal when it comes to warning messages, so it will often throw a fit over perfectly correct code! This is why I say it's for the really assured: At some point, you have to be willing to say, "Yes, I know splint is warning me about this construct, but I know it's correct, anyway."

Generally, you should not use splint — especially not as a last resort! If you're so confused that you need splint's help, you're probably too confused to understand anything it's likely to tell you. Personally, I don't use splint.

Note that what the lecture notes call "strict C" is technically called "C89," or "C90," or "C95," or "ANSI C." However, the actual, current C standard is "C99," and it does allow some C++isms, including // comments and variable declarations in the middle of a block. Still, this course follows common industry practice in adhering strictly to C89. Don't use C++isms in your code.

The notion of "point type" is peculiar to our lecture notes. The technical term is "the type of the thing the pointer points to." No jargon needed... for once!

As with the basic types, there's nothing magic about pointers that requires them to be exactly 32 bits. In fact, many DOS systems have 16-bit pointers.

Dereferencing a null pointer does not halt the program; it invokes undefined behavior. Don't Do This.

1 September 2005

Obviously, C has more than one array type, just as it has more than one function type, pointer type, and so on.

The name of an array variable is not the same as the address of the first element of the array; at least, not in all cases! Consider the following code snippets.

    #include <stdio.h>

    int main(void)
    {
        char *p;
        char arr[42];
        printf("%d\n", (int) sizeof p);
        printf("%d\n", (int) sizeof &arr[0]);
        printf("%d\n", (int) sizeof arr);
        return 0;
    }

The first two will print the size of a pointer to char on your system, probably 4. The third value printed, however, will definitely be 42, since that is literally the "size of" the array in question. Thus, the expression arr doesn't always behave like the expression &arr[0].

In "arrDemo1.c": The usual admonition about fflush applies to this code. Also, the comment "an array's name is a const pointer" has already been debunked.

    #define EXPECTED_CONVERSIONS 2
    #define CONVERSIONS 2

In this case, we have run afoul of a little-known restriction in C. The C language is evolving; as I mentioned earlier, it has already gone through two major revisions (C89 and C99) and quite a few minor technical changes. But old C code is still around. So we have a backwards-compatibility problem here. The new compilers need to work with old code, and — to whatever extent possible — the code we write today needs to continue to work when new compilers come out!

So C has adopted a concept called reserved identifiers. Some identifiers (that is to say, variable names, but also names of functions, data types, macros, and other things) are simply reserved for future extensions to the language. We, as C programmers, aren't allowed to use them. Some of these reserved identifiers are more obvious than others. For example, we can't define a new function called strrev, because that might conflict with a future extension to the language. In fact, all names beginning with str and a lowercase letter are reserved. So are names beginning with to and a lowercase letter.

(But don't go changing everywhere you wrote int top; in your code. These restrictions generally only apply to names with global scope, not function parameters or local variables.)

The particular restriction here is on names beginning with a capital E followed by another capital letter. The fix? Get rid of the initial E!

But there's a deeper problem with the CONVERSIONS macro. After all, what is it being used for? It's being used in the getInts routine like this:


    int getInts(int *a, int *b)
    {
        printf("Enter %d ints sep. by whitespace: ", CONVERSIONS);
        fflush(stdout);
        return scanf("%d %d", a, b) - CONVERSIONS;
    }

Okay, that looks good, right? We're prompting the user for CONVERSIONS integers, and then scanning them in and returning the negative of the number of integers we failed to read. (Here's a tip: If you find yourself having to use phrases like "the negative of the number of integers we failed to read" while explaining your code, you're probably over-complicating things!)

The problem is this: Suppose we want to read three numbers instead of two. We might be tempted to #define CONVERSIONS 3 and recompile. But this wouldn't be good enough! We also need to go down to the getInts function and update the format string to scanf, from "%d %d" to "%d %d %d". (By the way, the spaces in that format string are unnecessary.)

And then, of course, we'd have to change the prototype and definition of getInts so that it took three parameters, and update any code that actually used getInts... The point is, there's no reason for CONVERSIONS to exist as a macro at all, since it's hard-coded in so many other places! If you mean "2", just write "2".

In the first sub-bullet on the page, "empty braces" should be "empty brackets." In the second sub-bullet, both "braces" should be "brackets."

In the fourth sub-bullet: The "array is a pointer" misconception has already been debunked. However, in some cases an array's name will evaluate to a pointer to the first element of the array. (Note that no pointer variable exists. This pointer is a pointer value, just like 42 is an integer value — the value of the address of arr[0].)

By now it should be clear precisely why you can't assign a new value to an array like this:

    int arr[42], barr[42], *p = arr;
    arr = 5;    /* arr is not an int! */
    arr = p;    /* arr is not a pointer! */
    arr = barr;

In the last of these lines, the expression barr decays into a pointer to barr[0], so again we're trying to assign a pointer to arr, and failing. The compiler won't let us do that.

To copy one array over another, you can use

    int arr[42], barr[42];

    memcpy(arr, barr, sizeof barr);  /* copy the whole array */
    memcpy(arr, barr+10, 22*sizeof *barr);  /* just 22 elements */

The memcpy function is like the strcpy function for strings, but it doesn't care what kind of data it's copying — and it doesn't stop for zeros. So you have to provide the length of the run of bytes you want copied from barr to arr; in this case, we're copying the whole thing in the first case, and just the middle 22 elements in the second case.

Header files

In "arrDemo2": As the comments say, the indentation in the header file is not typical. Typical indentation looks like this:

    #ifndef H_BITAP
     #define H_BITAP

    #include 

    #define WORD_LENGTH 16

    typedef struct bitap_info bitap_info;
    struct bitap_info {
        int matches;
        int *pos;
    };

    int bitap_fsearch(FILE *fp, const char *key);
    int bitap_search(const char *text, const char *key);

    #endif

Here I've made up a fictitious header file including all the parts you're likely to see in this course: the inclusion guard, an #include directive, a macro definition, a struct definition and associated typedef, and a couple of function prototypes. You'll normally want to put things in this order.

The #include directive is only there because otherwise the reference to FILE might not be recognized by the compiler (depending on whether the client code #included <stdio.h> before this file). In most of your work, the header files you write should not #include any other headers. The rule is: If you don't need it, don't drag it in.

Strings

As the notes say, NULL is a pointer. It is not, and never has been, equivalent to the null character '\0'. (Although, confusingly, it is possible to assign '\0' or any other integer with value zero to a pointer, and then that pointer becomes a null pointer, just like NULL. However, this isn't something you should be doing. Just use NULL for pointers, '\0' for characters, and 0 for integers, and you'll never be confused.)

A string in C is slightly more than an array of char — it's an array of char which contains a null terminator (the aforementioned '\0'). An array of characters that doesn't contain a null terminator isn't a string. For example:

#include <string.h>

int main(void)
{
    char not_a_string[5] = "hello";
    char buffer[42];
    strcpy(buffer, not_a_string);
}

You can't strcpy something that's not a string!

In "strings": The usual admonition about fflush applies to this code. Also, the usual admonition about buffer overflows applies to the scanf calls, but of course that's the point.

NULL is not the same as '\0', and shouldn't be used interchangeably with it. That's just gratuitously confusing. Also previously mentioned: C is not ASCII, so the explication of Question #2 is wrong for some platforms.

What `const` means

Still in "strings": The whole confusing, self-contradictory and incomplete explanation of const ought to be removed. Here's the real deal: const is a keyword in C (and C++) that means "Don't modify this." More specifically, if you see const in a variable or parameter declaration, it means "Don't modify the object to my right."

    int sun;
    int const life = 42;
    int * const sunflower = &sun;
    int const *finger = &life;

These four lines define an int (obviously); a const int called "life" with the fixed and unchangeable value 42; and two pointers, sunflower and finger. sunflower is a const pointer, so you can't change its value — it will always point to sun. However, you can assign a new value to *sunflower (that is, you can assign a new value to sun through sunflower).

Contrariwise, finger is a non-const pointer to a const int. You can see this by looking at what's to the right of the keyword const in the declaration; it says, "Don't modify *finger." But as long as there's no const in front of finger itself, we're allowed to modify it.

The compiler will warn us — in fact, it will refuse to compile the code at all — if we try to assign to a const object, directly or indirectly. Also, note that sunflower cannot be made to point to life, because the compiler "knows" that *sunflower is modifiable and life is not.

Because the compiler enforces const so strictly, we can use its presence as a "contract" with the client, or with a library such as <string.h>. Since strcpy's second argument is declared as "const char *", we know that strcpy will never modify that argument's contents. This is an important guarantee to have!

You may be wondering what "const char *" means, anyway, since in that case the thing to the right of const is "char *". Well, that's a special case. If there's a typename to the right of const, then the declaration means the same thing as if you swapped the const and the typename: "const char *" means the same thing as "char const *" — it means, "Don't modify the characters this pointer points to."

In the description of strcmp, the first "strcpy" should be "strcmp". The key concept here is "lexicographic order" — lexicographic as in a dictionary. strcmp sorts "pear" before "pearl" and after "pea."

In "2D-strings1":

    #define ROWS 3;
    #define COLS 5;
    #define ROWS 3
    #define COLS 5

This is another example of when not to use macros — the numbers of columns and rows in the matrix are essentially hard-coded by the calls to strcpy.

The mention of scanf should refer to strcpy instead.

6 September 2005

I prefer to see words such as "command-line arguments" written out, not abbreviated as "cmd args" or other such l33tsp34k. That's just me.

This isn't a mistake in the notes, but I think it's worth mentioning the etymology of "argc" and "argv": You can think of the "c" as standing for "count" and the "v" as standing for "values." Thus argv is a pointer to the first element of an array with argc values in it, indexed 0 through argc-1.

In "fileIODemo1.c":

    printf("Cannot open input file: %s", fileName);

Note also that filename should be declared const char *, since the function doesn't modify the string.

    while ((fscanf(inFile, "%s", wordBuffer) > 0) &&
           (*wordCount < MAX_WORDS))
    {

See the page on safe input. In short, the quickest fix would be

    while ((fscanf(inFile, "%29s", wordBuffer) > 0) &&
           (*wordCount < MAX_WORDS))
    {

However, that's not a desirable fix, since it hard-codes the value 29 into the program, rendering our definition of MAX_WORDLEN utterly redundant. One correct fix would be

    char spec[10];
    sprintf(spec, "%%%ds", MAX_WORDLEN-1);
    while ((fscanf(inFile, spec, wordBuffer) > 0) &&
           (*wordCount < MAX_WORDS))
    {

But frankly, that's a little silly. The most practical and realistic solution is probably to define wordBuffer with a very large size — say, 1000 or 2000 — fscanf with that parameter hardcoded, and then test strlen(wordBuffer) against MAX_WORDLEN before performing the strcpy.

An alternative to testing strlen would be to truncate the input word, like this:


    char wordBuffer[1000];

    while ((fscanf(inFile, "%999s", wordBuffer) > 0) &&
           (*wordCount < MAX_WORDS))
    {
        sprintf(dictionary[*wordCount], "%.*s", wordBuffer, MAX_WORDLEN-1);
        *wordCount += 1;
    }

The call to sprintf will copy at most MAX_WORDLEN-1 characters from wordBuffer into the dictionary. Unlike the dangerous function strncpy, it will always null-terminate its output, so we don't need to worry about not ending up with a string.

"You could put both values in brackets but all you would accomplish is to restrict your function to accepting only a matrix of those exact dimensions." This is just plain wrong. The compiler will ignore the useless dimensional information. Remember, arrays in C are passed as pointers to their first elements; thus, all four prototypes

    int foo(char a[5][7]);
    int foo(char b[7][7]);
    int foo(char c[][7]);
    int foo(char (*d)[7]);

are perfectly compatible with each other.

Man pages aren't much like an API (application programming interface). They're more like documentation. In fact, while it's common to hear programmers talk about the Java API, in C and C++ programmers typically refer to the standard library, and mentioning the "C API" will get you funny looks.

Strictly speaking, a FILE object in C is not a file handle. That phrase has a specific technical meaning on Unix and Linux systems; a FILE object may contain a file handle, but isn't one itself. (If you take 15-213, you will cover file handles in excruciating detail.)

return is not a function, and as such cannot be "called." (In fact, return is the keyword you use to do the opposite of "calling"!)

"Reading individual characters from the keyboard": See the page on safe input.

"If your program was hard to write it should be hard to read :-)". This, surprisingly, is good advice — and vice versa. If I'm reading your code, and I'm finding it hard to read, that should be a good indication that you found it hard to write. And if you had trouble writing it, that's probably because you didn't understand the material, and therefore deserve a poor grade on the lab. Conclusion: If your code is hard to read, your grade will suffer as a result. Don't say we didn't warn you. :)

Another comment in the same vein:

Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it.
—Brian W. Kernighan

Yes, that is the Kernighan of "Kernighan & Ritchie."

One piece of advice on Lab 1: Not only should you be using a small input dictionary to start, you should be using a very small MAX_WORDS and MAX_WORDLEN, to make sure that your program handles bad input gracefully. In Lab 2, the same advice applies to INITIAL_LEN, so that you can easily see whether your array-resizing code works correctly.

This page was last updated 15 December 2005
All original code, images and documentation on this page are in the public domain.