perlun.eu.org · About the blog · Programming · Select language / Välj språk

The dangers of referential equality in Java

In this post I'll write about referential equality (ie the == operator) and why it can be very dangerous to use this operator incorrectly in Java. I made this mistake today, and it is my hope that this blog post will help you to avoid doing the same mistake yourself (or at least get a good laugh). :wink:

Here was my use case today. I had some code that looked like this:

    List<T> allObjectSnapshots = dao.queryForAll();

    for ( IdTime<ID> idTime : result ) {
        Optional<T> matchingSnapshot = allObjectSnapshots.stream()
                .filter( obj -> obj.getId() == idTime.getId() && obj.getTime().equals( idTime.getTime() ) )
                .findFirst();

        if ( matchingSnapshot.isPresent() ) {
            latestObjectSnapshotsBuilder.put( idTime.getId(), matchingSnapshot.get() );
        }
        else {
            logger.warn( "Failed to locate latest snapshot for {} {}", dao.getTableName(), idTime.getId() );
        }
    }
}

Both idTime.getId() and obj.getId() are defined to return a value of type ID (generic type parameter to the method in question), and I had previously used this with an Integer parameter. It worked correctly, and I hadn't thought much about whether the comparison above (obj.getId() == idTime.getId()) was correct or not.

All of that changed today. I was doing some changes to the code where I also needed to use String values as an ID. This meant that obj.getId() would now return a String instead of an Integer.

The problem: String instances simply cannot be compared this way

And obviously (for those of you who know your Java by heart), this did not work. This is simple when you look at some example values in jshell:

$ jshell
|  Welcome to JShell -- Version 11.0.11
|  For an introduction type: /help intro
S
jshell> String s1 = new String("foo")
s1 ==> "foo"

jshell> String s2 = new String("foo")
s2 ==> "foo"

jshell> s1 == s2
$3 ==> false

jshell> s1.equals(s2)
$4 ==> true

The reason is simple: these two strings refer to different object instances. The content equality check (s1.equals(s2)) works, but == will perform a "referential equality" check, which will fail for String objects. Other languages (C#, I'm jealously looking at you!) have overloaded the == operator for the String class to do the least unexpected thing (i.e. check for "content equality"), but Java currently doesn't work that way. (at least not right now; Valhalla might fix this if we are lucky. More about this later.)

One gotcha: String literals is the exception that confirms the rule

Note, however, that there are exceptions to the above rule. String literals with the same content are, interestingly enough, "interned" to refer to the same instance. This is explained in JLS 3.10.5. See this example for an illustration:

$ jshell
|  Welcome to JShell -- Version 11.0.11
|  For an introduction type: /help intro

jshell> String s1 = "foo";
s1 ==> "foo"

jshell> String s2 = "foo";
s2 ==> "foo"

jshell> s1 == s2
$3 ==> true

jshell> s1.equals(s2)
$4 ==> true

Both of these strings refer to the same String instance. They can be compared using both referential equality (s1 == s2) and value equality (s1.equals(s2)).

How did this ever work with Integer values in my use case?

This is now the $1,000,000 question...

How did this ever work with my previous Integer parameter?!?

Well. There is again a significant exception to the rule, that can bite you really hard if you are unlucky. I'll start with the example from jshell first, and then try to explain why it works like this:

Boxed small integers: refer to the same Integer instance

$ jshell
|  Welcome to JShell -- Version 11.0.11
|  For an introduction type: /help intro

jshell> Integer i = 1;
i ==> 1

jshell> Integer j = 1;
j ==> 1

jshell> i == j
$3 ==> true

Boxed larger integers: does not refer to the same Integer instance

jshell> Integer k = 1048576;
k ==> 1048576

jshell> Integer l = 1048576;
l ==> 1048576

jshell> k == l
$6 ==> false

Now, if the previous semantic details for how Strings are handled weren't enough, this should be more than enough to make a grown man cry... :joy:

Some more details around this can be found in the Integer.valueOf() implementation. I can' say I know for sure, but I presume this is the method that gets called by the JVM whenever an int (integer primitive) value is auto-boxed into an Integer (full Java-object with object identity, wrapping an int)

Interestingly enough, the code in the linked class goes as far as to say that this is actually required by the JLS for values between -128 and 127. I presume it's a quite critical optimization in the JVM; the alternative (to always instantiate a new Integer object every single time an int would be auto-boxed would likely lead to a huge performance impact both in terms of memory allocation and perhaps even more importantly, GC pressure).

Conclusions

Needless to say though, this is definitely something that can be a huge gotcha in the current Java version(s), and it's easy for people like me and probably others as well to make simple mistakes like this. It's perhaps even easier to make mistakes like this if you regularly work with other languages like C# which has the more "relaxed" semantics (where s1 == s2 will always work, regardless of whether s1 and s2 are string literals or dynamically constructed strings).

Luckily, the good people in the JDK project are actively working on improving this. Quoting the "State of Valhalla Part 2" design note linked below:

Many of the impediments to optimization that Valhalla seeks to remove center around unwanted object identity. The primitive wrapper classes have identity, but not only is this identity not directly useful, it can be a source of bugs. (For example, due to caching, Integers can be accidentally compared correctly with == just often enough that people keep doing it.)

Well, yes - this was exactly what happened to me. Thanks to Sebastian Lövdahl for sharing that link with me after I had cried out to him in agony today. :grin:

Further reading