Some thoughts about the Java Native Interface

JNI, the Java Native Interface, is the most popular, oldest way of interfacing native code with Java code. It was designed by Sun Microsystems in the 1990s, along with the rest of Java. What exactly is "native code"? Well, in practical terms, it's C and C++ code.

Designing an interface to bind together these two different programming paradigms was never going to be trivial. Just to give one example, there needs to be a way for the C code to prevent Java objects it is using from being garbage collected out from under it, for example.

Despite the difficulty of the problem, I still feel that JNI could have been a lot better than it is. It's just a lot more cumbersome and slow than it needs to be.

I think the designers of Java always anticipated that native code would eventually go away, and so they weren't too worried about optimizing access to it. Indeed, the introduction to the JNI book states:

The JNI allows programmers to take advantage of the power of the Java platform, without having to abandon their investments in legacy code.

"Legacy code" is presented as the only possible motivation for using JNI. Because we all know that soon, the world will convert to Java exclusively.

The Natives are Getting Restless

So what design decisions specifically do I disagree with in JNI? Well, one of them is the need to pass a JNIEnv pointer to every function. Basically, every call that interfaces with JNI needs to pass in a JNIEnv pointer. You even need to access the JNIEnv pointer just to find the function pointer itself!

Here's an example:

jstream = (*jenv)->NewObject(jenv, g_cls_rf_in_stream,
    g_mid_rf_in_stream_ctor, (jlong)(uintptr_t)ofe);
        
Every single JNI method needs to be looked up in a table of pointers, stored in the jenv object. This is not only cumbersome to write, but also slower than it needs to be.

There is already a mechanism in C for dynamically associating function calls to function implementations. It's called the dynamic linker. Basically, every time you link your code against a foo.so library, you are making use of the dynamic loader. You will get whatever version of the foo library happens to be installed on the system when you run the program. You can even load new dynamic libraries after the program has started using, dlopen and friends.

This is all extremely basic technology that is decades old. It was definitely around when Sun was designing this interface. As the creators of SunOS, and later Solaris, Sun would have had people on the staff who not only knew what a dynamic loader was, but who had actually implemented it for those operating systems. So really, there is no excuse.

The String Fiasco

Let's consider a really basic task: copying a string from Java to C. How should you go about doing this? Well, you have a lot of different choices, but all of them are bad.

The first choice is GetStringUTFChars. This function will give you back a pointer to a NULL-terminated, UTF8 C string. So far, so good. However, you don't get to hang on to this pointer. You must call ReleaseStringUTFChars on it promptly.

What does GetStringUTFChars actually do? Well, in nearly every real virtual machine I've ever heard of, the strings are stored as UCS-2 internally. So this function will have to make a copy, doing something similar to a malloc. Then ReleaseStringUTFChars is similar to a free. This is rather inefficient and we'd like to avoid it. What else is available?

There is also GetStringCritical, which purports to allow you to access the string more efficiently than GetStringUTFChars "on some platforms." They do this by potentially disabling garbage collection between calls to GetStringCritical and ReleaseStringCritical. Then, with garbage collection safely disabled, you can access the string directly, with no copying.

Notice I said that they "potentially" disable garbage collection. JVMs are free to do whatever they like, according to the JNI book. I suspect that on modern platforms, disabling garbage collection globally just because one thread called a wacky JNI function would not be a net performance gain. In that case, these functions are just stubs that call GetStringUTFChars anyway. Luckily, there's no need for me to waste time doing any more research on this, because these functions are useless for another reason: they don't return UTF-8 strings. The reason is obvious: as a practical matter, JVMs use UCS-2 internally, not UTF-8.

There's a final API called called GetStringUTFRegion. This one doesn't make an unecessary copy, and it supplies true UTF-8 data. The JNI book gives this example:

JNIEXPORT jstring JNICALL 
Java_Prompt_getLine(JNIEnv *env, jobject obj, jstring prompt)
{
    /* assume the prompt string and user input has less than 128
       characters */
    char outbuf[128], inbuf[128];
    int len = (*env)->GetStringLength(env, prompt);
    (*env)->GetStringUTFRegion(env, prompt, 0, len, outbuf);
    printf("%s", outbuf);
    scanf("%s", inbuf);
    return (*env)->NewStringUTF(env, inbuf);
}
 
Do you see the problem here? There is no bounds checking here. If the Java string turns out to take more than 128 bytes in UTF-8 form, we get a buffer overflow. I find it hard to believe that the folks at Sun actually put this example to paper. Didn't they know anything about proper programming practices? This is an error on par with using gets() in a program-- a newbie mistake I would expect any professional programmer to be ashamed of.

The function arguments are brutally stupid. The function prototype is:

void GetStringUTFRegion(JNIEnv *env, jstring str,
    jsize start, jsize len, char *buf);
        
Ah-- the function gives you a 'len' argument. You might think that you could use this to avoid the buffer overflow. But you can't. The 'len' argument is the length in UCS-2 units not in bytes. The byte length may be completely different than the UCS-2 length, since UTF-8 is a variable length encoding. So for 99.999% of the time, when you simply want to copy the whole string, start and len are completely useless. It's a beginner's trap, especially when the manual itself gets it wrong. So what can you do? Well, you have to call GetStringLength, to find out what to pass as the UCS-2 len. You also need to call GetStringUTFLength, to find out if your output buffer is long enough. Finally, you need to call ExceptionCheck afterwards.

What could have been one function call, properly designed, turns into:

int jstr_to_cstr(JNIEnv *jenv, jstring jstr, char *cstr,
    size_t cstr_len)
{
    int32_t jlen, clen;

    clen = (*jenv)->GetStringUTFLength(jenv, jstr);
    if (clen > (int32_t)cstr_len)
        return -ENAMETOOLONG;
    jlen = (*jenv)->GetStringLength(jenv, jstr);
    (*jenv)->GetStringUTFRegion(jenv, jstr, 0, jlen, cstr);
    if ((*jenv)->ExceptionCheck(jenv))
        return -EIO;
    return 0;
}
        

To the best of my knowledge, this is still the most efficient and correct method to get a string out of Java.

Conclusion

So there you have it. Two major design flaws in JNI. I could probably list more, but I'm out of time for today. Good luck, and have fun trying to stuff the JNI back into the bottle.