CMU 15-113: Miscellany

Introduction to the Unix command line
Command syntax
Man pages
More about C: the sizeof operator
Assert
Stream I/O
How to keep a 15-113 student busy
Footnotes

Introduction to the Unix command line

A growing number of intro students have never actually seen a command line before. (That's scary.) Well, if you haven't seen one before, then you were probably feeling lost on the first day. Here's a quick rundown:

Command-line interfaces predate windowed interfaces like MacOS and Windows by a long way. For a history that's alternately amusing and desiccated, see Neal Stephenson's In the Beginning was the Command Line. Basically, all you need to know is that command-line interfaces are really old, and so if something doesn't make sense to you now, it might be because it really doesn't make sense anymore. For example, the Unix command to remove all the files in a directory is rm *. Four characters, hastily typed, to irrevocably trash all your files. (Unix doesn't have a "recycle bin," either.^*) This is stupid today, but back in the old days, you had to type all your commands on a teletype keyboard — a monstrosity that took several pounds of pressure to depress each key. So forcing the user to type really remove all my files please instead would have been pretty cruel.

Windows has a command-line interface called "Command Prompt." (It's also referred to as "DOS box" or "DOS prompt" because the interface is very similar to Microsoft's old operating system MS-DOS, which stands for Disk Operating System. Contrary to popular misconception, Microsoft wasn't the only software company with a DOS — that's why they needed to call it MS-DOS, to distinguish it from all the other DOSes floating around.) The Windows commands are almost the same as the Unix/Linux/MacOS commands, but some are different; see the table below.

On Mac OS X, the command-line interface is called "Terminal," and it's basically the same as Unix.

The following Unix commands will be useful to you. Their Windows equivalents are in italics in the second column:

`ssh hbovik@unix.andrew.cmu.edu`	PuTTY	Start a new remote Unix session on one of the Andrew Unix machines.
`exit`, `logout`	exit	Quit and log out of the Unix session.
`ls`, `ls foo`	dir, dir foo	List the contents of the current directory; list the contents of directory `foo`.
`cd foo`, `cd ..`	cd foo, cd ..	Change the current directory to subdirectory `foo`; change the current directory to the enclosing superdirectory.
`cd`, `cd ~hbovik`		Change the current directory to my home directory ("My AFS").
`pwd`	cd	Print the name of the current ("working") directory.
`rm foo`	del foo	Remove file `foo`.
`cp foo bar`	copy foo bar	Make a copy of file `foo` named `bar`, overwriting any existing file of that name.
`mv foo bar`	move foo bar, rename foo bar	Rename (move) file `foo` as `bar`, overwriting any existing file of that name.
`scp foo hbovik@unix.andrew.cmu.edu:bar`	Pscp	Copy file `foo` onto a remote Unix machine. You can use this command if "My AFS" doesn't show up in your MacOS Finder window.
`cat foo`	type foo	Dump the entire file `foo` to the screen.
`less foo`, `cat foo \| less`	more foo, type foo \| more	Dump the entire file `foo` to the screen, with page breaks.
`grep foo *.c`	find "foo" .c, but...^	Find everywhere the string `foo` is located in any of the C files in the current directory.
`gcc -O2 -W -Wall -ansi -pedantic -o bar foo.c`	gcc -O2 -W -Wall -ansi -pedantic -o bar.exe foo.c	Compile `foo.c` into an executable named `bar`.
`man glomulate`	no such luck	Look at the help file (manual) for the program, command, or library function `glomulate`. Try this now!

Command syntax

As you should have guessed by now, even if you've never seen a command line before, command-line commands always look something like this:

    % program-name -oneflag -twoflag -redflag -blueflag files-to-operate-on

(The percent sign is just the Unix prompt; you don't type that.) Sometimes the program name is really a built-in shell operation, such as cd; other times it's really a program, like gcc. Sometimes the program expects a filename on the command line, and other times it expects to get its input at the keyboard. (The programs cat and grep are like this, for example.)

Some programs expect a lot of single-character flags, or options; GCC is like that. In some cases, they'll allow you to concatenate a bunch of flags into one parameter; in that case, the parameter -oneflag would just be equivalent to the seven flags -o -n -e -f -l -a -g. So now, if you accidentally type gcc -myfile.c, and GCC complains, you'll know why!

Notice the pipe (|) in the table above. Unix, and operating systems that came after it (including MS-DOS and therefore Windows), let you "pipe" the output of one program directly into the input of another program that's expecting to get its input from the keyboard. more and less are two programs like that. (What's with the names? Well, Mies van der Rohe once said, "less is more." He was pretty much right.) If you ever get a whole screenful of error messages from gcc, try piping its output through more:

    % gcc -O2 -W -Wall -ansi -pedantic demo1.c | more

Other useful pipes include head and tail (not part of the Windows command suite, unfortunately). What do they do? How can you find out?

Two other "piping" characters are > and <, the "pipe out" and "pipe in" characters. For example,

    % wc <demo1.c >howbig.txt
    bash-2.05b$ gcc -O2 -W -Wall -ansi -pedantic demo1.c 2> warnings.txt

(Bash supports the 2> syntax; the Andrew machines' default "C shell" does not.^*)

Man pages

The Unix man pages are unbelievably helpful. There's a man page for each function in the standard C library, so you generally shouldn't have to wonder, "How do I use this function?" Just look it up! For example, try man strcpy right now. I'll wait.

You can also man ls to see what flags ls accepts, or man gcc for that program's options. However, you should know that most of the GCC documentation is only available through the GNU equivalent of man, which is called info. Type info gcc for the full scoop.

More about C: the `sizeof` operator

In the parsing lecture, I mentioned the sizeof operator in a footnote. sizeof is terribly useful in C, and here's why: It tells you how many bytes a given object will take up in memory. (See K&R, section 6.3.)

Try it now! Run the following program:^*

    #include <stdio.h>

    int main(void)
    {
        char c; int i; long j;
        float f; double g;
        char *s;
        int a[10];
        printf("char: %d\n", sizeof c);
        printf("int: %d\n", sizeof i);
        printf("long: %d\n", sizeof j);
        printf("float: %d\n", sizeof f);
        printf("double: %d\n", sizeof g);
        printf("char pointer: %d\n", sizeof s);
        return 0;
    }

What do you observe? Is long actually larger than int on your compiler? (It must be at least as large.) Is double larger than float? (One thing we're guaranteed is that sizeof(char)==1.)

Based on the value of sizeof i, what do you think sizeof a should be? Test your prediction.

Based on the result of that test, you should be able to understand the purpose of the following function-like macro.^*

    #define NELEM(array) (sizeof array / sizeof array[0])

The sizeof operator also works on literal values, like 42 and "hello, world"; and it works on type names in parentheses, like (int) and (struct foo *). You generally shouldn't use it with either of those arguments, though, because they can get confusing. (Example: What do you think is sizeof 'a'? Try it. Is it what you expected?)

Even though you won't ever use sizeof on string literals, it's worth knowing that C treats string literals just like arrays of char. Predict the results of the following program, then check them:

    #include <stdio.h>
    #include <string.h>

    int main(void)
    {
        char a[] = "hello";  /* declares an array */
        char *s = "hello";   /* declares a pointer */

        printf("sizeof a: %d\n", sizeof a);
        printf("sizeof s: %d\n", sizeof s);
        printf("strlen(a): %d\n", strlen(a));
        printf("strlen(s): %d\n", strlen(s));
        return 0;
    }

Did you forget about the terminating '\0' character at the end of both strings? Did you forget the value that we already determined for sizeof s in the previous program? (Remember, sizeof has to do with the amount of storage an object needs — it has nothing to do with the value of that object!)

What do the lines of code in red do? Are they more or less clear than the lines in blue? Which do you think another programmer would prefer to see in a C program, if he was trying to verify that the program was correct?

    foo *bar = malloc(sizeof (foo));
    foo (**bar2)[10] = malloc(sizeof (foo (*)[10]));
    
    foo *baz = malloc(sizeof *baz);
    foo (**baz2)[10] = malloc(sizeof *baz2);

Assert

Consider the following macro definition. Why is it useful?

    #define ASSERT(x) if (!x) { printf("Assertion failed!"); exit(0); }

Why is the following macro definition qualitatively better than the one above? Give at least two reasons.

    #define ASSERT(x) do {                            \
            if (!(x)) {                               \
                printf("Assertion '%s' failed!", #x); \
                exit(EXIT_FAILURE);                   \
            }                                         \
        } while (0)

(The "#" in "#x" is called the preprocessor's "stringifying" operator. See K&R, section 4.11.)

If you are having trouble debugging a program, try inserting assertions all over the place.

Stream I/O

One major innovation of Unix, and by extension C (since C and Unix were developed in parallel), was the idea of stream input and output. Sometimes this idea is expressed as "Everything is a file" — not only files, but output to the screen, input from the keyboard, I/O over network sockets, the system's internal random number generator, and many other things.

Before C, many operating systems and languages treated disk files as random-access data structures; to scan through a file, you'd say "read file section 1, read file section 2, read file section 3..." This sounds inconvenient, but consider something like a small-business payroll application, with a data file consisting of a whole lot of same-size records sorted according to a key. Then random access allows you to do a binary search on the data without reading the whole file into memory.

But with C, the designers decided to do something different. Input and output in C works kind of like the queue at a McDonald's drive-thru. Incoming traffic "lines up" at the entrance and is admitted strictly in order. After some processing, outgoing traffic "lines up" at the window and is sent out strictly in order.

The fgetc function (getc, getchar) reads in the next input character in the input queue; the fputc function (putc, putchar) adds a character to the output queue. (Saying "the" is somewhat incorrect. Each C program can have many input and output queues: one for each file that's open for reading or writing.) fflush flushes an output queue.

The complicated I/O functions (scanf, printf, fgets) are built on top of the getc and putc routines. This explains why sscanf("4e", "%f%c", &f, &c) doesn't do what you might think it does: By the time the function gets to the end of the string and realizes that maybe the "e" should have been put into c instead of waiting for an exponent, it's too late — the "e" has already been popped off the input queue, and can't be "pushed back on" to try again. Just like a paper boat floating down a real-life stream, data in a C stream can't move against the current. Once it's read in, it's in; and once it's printed out, it's out.

How to keep a 15-113 student busy

If this course is boring you, don't read webcomics during class! Read the Infrequently Asked Questions for comp.lang.c instead. It will keep you busy, if you think you already know C. (Warning: The questions and answers are full of disinformation, on purpose. It is a satire site. If you don't get one of the jokes, you can always ask.)

And of course every smart-aleck 15-113 student should be aware of the IOCCC, the International Obfuscated C Code Competition. Again: this is a humorous pastime, intended for external use only. And most of the code in the IOCCC isn't really standardized or portable, either. (Unlike the IAQ, which is specifically targeted at people familiar with the C Standard.)

Footnotes

Well, on the Andrew machines there's this directory called "OldFiles" that is periodically updated with a complete snapshot of your user directory, so you can recover somewhat from really boneheaded moves like that. But that's not generally true. And the same thing applies to the del command inside Windows' "Command Prompt" — it trashes files instantly, rather than moving them to the Recycle Bin. The moral of this story is that command-line (read: old) interfaces expect you to know what you're doing.
The Windows find utility is astronomically dumber than the standard Unix tool grep (which by the way stands for "globally find regular expression and print," for historical reasons that you can read about somewhere else). So it's not really an equivalent. But it's the closest thing Windows has, unless you go download something smarter.
Why do we need 2> instead of just plain > to redirect GCC's output? Because GCC's output isn't coming out on the "normal" output stream stdout; it's coming out on stderr, which by default goes to the screen also. The 2 tells Bash that we really want to capture the stderr stream instead of stdout. In this course, we won't do anything involving stderr, although we will deal with other streams besides stdin and stdout when we talk about files.
Okay, technically this program isn't conforming C code. See, the result yielded by sizeof isn't really an int — it's a special unsigned integer type called size_t — so "%d" isn't the right format specifier in this case. We have two options: We can explicitly convert the sizeof expression to int, or we can use the "%z" format specifier, which was introduced in 1999 precisely so that people could printf this kind of value. Then our program won't conform to the C89 standard that we use in this course, but it will be conforming C99 (and GCC will accept it happily, too).
I prefer the following definition, for technical reasons:
```
#define NELEM(a) ((int)(sizeof (a) / sizeof *(a)))
```
It may look scarier, but you should be able to figure out what the differences are, and (after reading the previous footnote) why they might be useful.

This page was last updated 30 March 2006
All original code, images and documentation on this page are in the public domain.