Wednesday, July 22, 2009

Java is not the JVM

For many IT people, it sounds funny to assert that the Java language has nothing to with the JVM itself. But as incredulous as it sounds, this is actually true. Let me explain, using some code as a shallow illustration how this is the case.

When I was hacking at the Java bytecode level, one of the things that I do is to optimise for memory efficiency. There is a need for storing an array of booleans, and the most obvious way of saving memory is to store it at a bitwise level, by stashing 8 boolean values within a byte.

Within the JVM, booleans are stored as bytes (executionally, they are worse: the VM treats booleans as ints!). Furthermore, in Java, there isn't a low-level means of utilising booleans as integral types like C can. If you had to write code in pure Java, at best you'll end up writing code like this:


// assume z == boolean[8]
byte b = 0;
for ( int i=0; i < 8; i++ ) {
if ( z[i] == true ) {
b |= ( 1
<< i );
}
}



Unlike C, the code is clunky, as you are having to perform a conditional check on a boolean, before you can perform bitwise operations on the values, because Java considers booleans as a non-integral type. How annoying!

But this constrain only affects the Java language - the same rules do not apply when it comes to the JVM. On the VM, it is perfectly legit for you to express code like this:


// assume z == boolean[8]
byte b = 0;
for ( int i=0; i < 8; i++ ) {
b |= z[i] << i;
}


However, just about any Java compiler disallows this code to compile - the operations on the boolean violates type-safety. But don't blame the compilers, they are just conforming to the language specifications. But since the JVM has nothing to do with the Java language, there is nothing illegal in doing so outside the Java language, let say by using bytecode assembly. Here's an equivalent, using jasmin assembly code:

.source BooleanToByte.j
.class BooleanToByte
.super java/lang/Object

.method public static main([Ljava/lang/String;)V
.limit stack 4
.limit locals 3

iconst_0
istore_1 ; byte b = 0;

iconst_0
istore_2 ; int i = 0;

LOOP:

iload_2
bipush 8
if_icmpge EXIT_LOOP: ; if i>=z.length exit loop

; here's the magic code that allows you to do direct
; bitwise
b |= z[i] << b="">
iload_1
aload_0
iload_2
baload
iload_2
ishl
ior
istore_1

iinc 2 1
goto LOOP:

LOOP_EXIT:

return

.end method

The jasmin code will probably assemble, but don't expect the JVM to execute it; it serves only as an example, and lacks a few things (I'm missing the constructor block and other nitty gritty little things that's needed to satisfy the bytecode verifier). It is but a case study to separate the JVM from the Java language as people typically assume.

There has been a number of other languages that has since mushroomed which relies on the JVM as its core; these languages include Groovy, Scala, Jython and JRuby, many of which are rather interesting, although they are more of a curiosity at this stage - I've yet to see any of these implementations deployed in a production environment, although I don't say that as a criticism of any of these languages. In fact, I am actually quite impressed with the JRuby, and I recommend you give it a try. It's very faithful to the actual Ruby implementation and allows you to use Java directly. Good fun, I'd say, especially when it combines the expressive of the former with the features of the latter. It's quite impressive that the JVM has been able to be so versatile in allowing other languages to plug into it directly.

0 comments:

Post a Comment