Strange behaviour in Java compilers' string concatenation (caused by compile-time constants)

I'm sure thousands have stumbled on this before me, but still, it's weird and I cannot explain why it works as it works. Shock At first glance it appeared that if you define a variable in a public class as public static final (ie. a constant) and the variable has a primitive type, then other classes referencing this variable will somehow "cache" the variable value in their own .class file. Looking deeper into the problem I've discovered that the situation is not as bad as I feared: I just crossed javac compiler's string concatenation optimization and it seems it does a bit "too much" for my taste.
Update: Jeff provided a reference to sections of the Java Language Specification that explain why and how this is working. The relevant term is: compile-time constants.

Let's see an example.
Create the file Alice.java as:
public class Alice {
  public static final int MY_STATIC_FIELD1 = 7;
  public static final Integer MY_STATIC_FIELD2 = 11;
}

Create the file Bob.java as:
public class Bob {
  public static void main(String[] args) {
    System.out.println("var1: " + Alice.MY_STATIC_FIELD1);
    System.out.println("var2: " + Alice.MY_STATIC_FIELD2);
  }
}

Now compile them and run Bob by executing the following commands:
javac Alice.java
javac Bob.java
java -cp . Bob

The output will be the values of Alice's static variables:
var1: 7
var2: 11

Now change the value of both variables in Alice.java to something else.
Eg.
public class Alice {
  public static final int MY_STATIC_FIELD1 = 14;
  public static final Integer MY_STATIC_FIELD2 = 22;
}

Compile it (just Alice.java!) and run Bob again:
javac Alice.java
java -cp . Bob

The value of var1 did not change, but the value of var2 did! Shock
This is not some stupid mistake (at least it's not mine Smile ). The two invocations of Bob's main() method occured in different JVM processes ... ie. totally independently from each other.

Now get a hexeditor and take a look on the bytecode of the Bob.class file. You will find that it contains the value of Alice.MY_STATIC_FIELD1 "burnt" into the bytecode. Actually what happens is that when the compiler evaluates the System.out.println("var1: " + Alice.MY_STATIC_FIELD1); line in Bob.java, it takes the value of Alice.MY_STATIC_FIELD1 and appends it to the "var1: " string and stores the concatenated string in Bob.class. Imho this is bad practise.

I've stumbled on this bug by changing some constant's value in a library (a lib that was still under development and that had no release version yet), compiling a JAR from it and then deploying it into an application's lib directory. The application still used the value from the old version of the library's JAR ... since the application's JAR was compiled with the old library JAR.

Now try the same with a variable of a String. It works the same as primitive types (thus the value of the static String get's stored in the class that references it)! Shock So the compiler does not decide based on whether the variable is of a primitive type or an Object (since both int and String "public static final" variables are stored within the referencing class.

But how can you get around this "feature" of the Sun JDK's compiler? In case of an integer value it's simple: use an Integer object instead of the int primitive type. We've already observed in our example that this works. But what about strings? We saw that the value of a constant String object was compiled into the referencing class's bytecode.

This whole stuff is some sort of "code optimization" on part of the Sun JDK compiler. Unfortunately I did not find any options/switches in the documenation of javac that could alter the default (imho buggy) behaviour (as most other compilers would allow me to do).

Btw. did you know about the javap class file disassembler that is shipped with JDK? Smile It does not give you real java source code, but it's close enough so you can reverse-engineer the logic. I'm sure there're dozens of disassemblers (probably a few free/open-source ones too) that will give you full disassembly of a class file.

I've found a topic in Sun's forums on string optimization in javac. The only difference is that in my case the optimizer concatenates a value from an external class ... which can (and has a real chance to) change after the compilation of the referencing class. Looking at the mentioned forum post it seems that string optimization concatenates only up to the point til the string's value is "sure" to be constant. And after that point the remaining components are not optimized any further.

So my_string in a code like this:
String my_var = "blabla";
String my_string = "a" + "b" + "c" + my_var + "d" + "e" + f";

will be optimized into this:
"abc" + my_var + "d" + "e" + f"

If my_var in the above example would be a "static final" variable, then the optimizer would have merged the whole bunch together in my_string into this:
"abcblabladef"

(Btw. if you're interested in interpreting bytecode, read the Java Virtual Machine Specification. It's available online for free. You can even start building your own Java compiler based on that. Wink )

A trivial "workaround" is to remove the "static final" keywords from the declaration of the variable and the optimizer will not merge its value into string concatenations. However this not a real alternative since it beats the whole point of using _constants_ (ie. if I remove "static final", then it's not a constant anymore Smile ). It seems that "static final" String objects are optimized during string concatenation, because the compiler "knows" that their value won't change (String objects are immutable - there's no method to change their contents). But what about the "static final" Integer in my example? Integer is immutable too, isn't it? At least I did not find any methods in the Integer class that could modify the int that the object wraps around. So why are "static final" Integers not optimized in the same way? Maybe that's yet to come in a future version of javac. Smile

Another trivial workaround: use objects instead of primitive types (and String) to store your constants. Eg. you could use Object and use specific type designators to improve on readability.
Eg. Alice could look like this:
public class Alice {
  public static final Object MY_STATIC_FIELD1 = 7;
  public static final Object MY_STATIC_FIELD2 = (int) 7;
  public static final Object MY_STATIC_FIELD3 = new Integer(7);
}

All three notations result in the same Integer object, but the latter two are easier to recognize/read. And using Object will prevent the optimizer from merging their values during string concatenations since an Object might change during runtime (and the Object's toString() method's return value might change as well).

Of course you could just simply make sure that once you declare a "static final" constant in a class (and rely on its value in another class in a string concatenation), you never ever change the constant's value in the source code. Smile To avoid the problem alltogether, I suggest you follow this simple rule: always recompile your code if you're intending to replace any libraries under it. Never replace a lib (ie. a JAR) in your application without recompiling your app. That should keep you safe. Smile

To summerize this post: the Sun JDK javac compiler optimizes string concatenations. Class constants (variables declared as "static final") are subject to this optimization too and developers should be aware of that and work around it if necessary.

P.S.: I've checked and confirmed this behaviour to be present both in JDK 5 (Update 16) and JDK 6 (Update 10) on the Linux platform. I assume it's present in Sun JDKs on all platforms.

Comments

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

It's part of the Java Language Specification

It isn't a bug. It's formally specificed by the JLS. These cases are called compile-time constants and have special rules regarding binary compatibility, initialization, etc. The compiler is just following the rules. See:

JLS Section 4.12.4 and JLS Section 13.4.9

The second section explains why constant inlining is required for switch statements.

If you don't want this behavior; it is very simple to work around by making sure your static final variable are not compile-time constants. You just have to pass them through a function at initialization time. Of course they won't work with switch statements either in this case.

Here's an example pattern:

  public static class StaticFinals {
    public static String dynamic(String s) { return s; }
    public static int dynamic(int i) { return i; }
    // TODO implement methods for each primitive type
  }
 
  public static class ExampleProgram {
    public static final String MY_STATIC_STRING = StaticFinals.dynamic("Might change");
    public static final int MY_STATIC_INT = StaticFinals.dynamic(42);   
  }

Re: It's part of the Java Language Specification

Thank you for the reply! I was hoping that someone could give me some plausible explanation on why it's working as it does.
The language spec. (the second link in your post) makes it clear that this is how it's supposed to work according to the spec. designers ... but still I think that this is a "feature" not advertised enough. I mean how many developers read the whole spec. from start til the end? Smile
The spec. says (as you also pointed out) the following:
One reason for requiring inlining of constants is that switch statements require constants on each case, and no two such constant values may be the same. The compiler checks for duplicate constant values in a switch statement at compile time; the class file format does not do symbolic linkage of case values.
But what other reasons are there? Shock If that's the only reason, then I'd vote for removal of inlining of compile time constants and changing how switch statements work. The advantages of such a change seem to outweigh the drawbacks (at least for me).

We should also point visitors/readers to the "15.28 Constant Expression" section which explains throughly what compile-time constant expressions are, so they can watch out for them.

Thanks Jeff for sharing the info! Smile

String compilation

Nice information, questions asked, answers given. Thanks a lot.

I too have a confusion regarding normal String variable manipulation by the compiler.
If I write a String variable like:
<code>String str = "ABCD";</code>
Then will the compiler optimize it and treat it similar to static final String?

Re: String compilation


If I write a String variable like:
String str = "ABCD";
Then will the compiler optimize it and treat it similar to static final String?

Yes and no. Smile
The answer depends on what you mean by "treat it similiar to static final String"?
If you ask whether referencing the contents of the variable from another class will result in compiling in the value into the referencing class, then the answer is no.

Here's a simple example to prove it in practice.

Have a class in a source file named "Alice.java" with the following contents:
public class Alice {
  String str = "ABCD";
}

Have a class in a source file named "Bob.java" with the following contents:
public class Bob {
  public static void main(String[] args) {
    Alice a = new Alice();
    System.out.println("var1: " + a.str);
  }
}

Now compile both and look into "Bob.class" either with a hexeditor or using the strings command (if you're on a Linux or Unix platform). You'll not find an instance of the string "ABCD" in "Bob.class", thus it was not treated as a compile-time constant.

However technically the variable defined by the code String str = "ABCD"; will be a compile-time constant as described in chapter "15.28 Constant Expression" of the Java Language Specification. However the reference to the str variable in Bob.java is not treated as a compile-time constant, since it does not match any of the cases described in the definition of compile-time constant expressions.