The setenv fiasco

Most UNIX developers are familiar with the concept of environment variables. Basically, environment variables are a way of passing configuration information to a process. Unlike command-line arguments, which usually need to be specified explicitly, child processes inherit the environment variables that their parents had.

The "environment" is an unordered set of key-value pairs. The keys are strings and so are the values. If you have a terminal open, you can see what your current environment looks like by typing env. It might look a little bit like this:

      ORBIT_SOCKETDIR=/tmp/orbit-cmccabe
      HOSTNAME=highcastle
      IMSETTINGS_INTEGRATE_DESKTOP=yes
      TERM=xterm
      SHELL=/bin/bash
      HISTSIZE=1000
      GTK_RC_FILES=/etc/gtk/gtkrc:/home/cmccabe/.gtkrc-1.2-gnome2
      WINDOWID=44042990
      QTDIR=/usr/lib64/qt-3.3
      QTINC=/usr/lib64/qt-3.3/include
      IMSETTINGS_MODULE=none
      JAVA_OPTS=-Xcheck:jni:nonfatal
      USER=cmccabe
      CSCOPE_EDITOR=/usr/bin/vim
      ...

Environment varibles are a simple and elegant way of passing around configuration information. It's easy to put a bunch of configuration variables into a script, and simply run that script before your application runs. You don't need to write a parser or ask your users to define a new configuration file syntax. Environment variables are accessible from any programming language, and available on any of the major platforms (Linux, Windows, MacOS.)

You can read an environment variable with getenv. You can remove environment variables with clearenv or unsetenv. You can set an environment variable with setenv or putenv.

A Day at the Races

None of the POSIX environment functions are thread-safe. Even the humble getenv, which you would expect to be re-entrant, is actually not required to be so. If you create a program that calls getenv from multiple threads without using a mutex to serialize access, you are relying on implementation-specific behavior, and POSIX makes no guarantees. I'm not aware of any implementation that will actually have problems with this kind of code, but it's still worth noting. However, if you are modifying the environment in one thread while other threads are reading it, you will have problems on Linux.

The solution is simple: just hold a mutex while accessing or modifying the environment. However, this makes using environment variables a lot more cumbersome. It also makes it difficult to use environment variables in a shared library, or in a scripting language embedded in a larger application.

Modifying the Environment

POSIX doesn't tell you how to allocate space for new environment variables. The original set of environment variables that are passed to the process when it starts are not allocated using malloc. So if you add a new environment variable, should the string be allocated using malloc or not? Nobody knows. Similarly, if you clear an environment variable using clearenv, should you call free on it? Good question-- but nobody knows the answer.

If you call free on a variable that was not originally allocated with malloc, heap corruption will result. On the other hand, if you don't free a variable that was malloced earlier, that is a memory leak. It's quite a dilemma, and POSIX is no help at all here.

There are actually three ways to set environment variables on Linux. You could use putenv, setenv, or modify the global variable environ. putenv is the simplest way. You give it a pointer to a string of the form KEY=VALUE, and it adds that pointer to the global environment. setenv tries to be more clever. It will use malloc to create a new KEY=VALUE string based on the KEY and VALUE that you pass to it.

Personally, I prefer to use putenv when adding environment variables. With putenv, you can avoid memory leaks by using statically allocated strings. The other method does not give you this choice. setenv does not exist on HPUX, Solaris, and some other UNIX platforms. However, putenv is mis-implemented as setenv on Mac OS X, FreeBSD, and some ancient Linux platforms.

Conclusion

So there you have it. The situation is a mess. You can't access environment variables safely from multiple threads and can't portably call setenv or putenv without introducing memory leaks.

Your best bet is probably to avoid modifying the environment at all, if possible. If that is not possible, try wrapping all accesses to the environment with a mutex , and modifying environ yourself, to make sure nobody is mallocing behind your back.

Stepping back a little bit, are these memory leaks and race conditions really such a big deal? Well, even small memory leaks can be annoying if they clutter up the output of tools like valgrind, a memory leak diagnosis tool. Also, once you allow incorrect code in your program, it tends to propagate itself, as new programmers look to the old code for examples of how to do things. Do yourself a favor and get it right the first time. As Captain Planet would no doubt say, cleaning up the environment is everyone's responsibility.

Russian Translation (Финалист сетенов)