The dangers of referential equality in Java
In this post I'll write about referential equality (ie the ==
operator) and why it can be very dangerous to use this operator incorrectly in Java. I made this mistake today, and it is my hope that this blog post will help you to avoid doing the same mistake yourself (or at least get a good laugh).
Here was my use case today. I had some code that looked like this:
List<T> allObjectSnapshots = dao.queryForAll();
for ( IdTime<ID> idTime : result ) {
Optional<T> matchingSnapshot = allObjectSnapshots.stream()
.filter( obj -> obj.getId() == idTime.getId() && obj.getTime().equals( idTime.getTime() ) )
.findFirst();
if ( matchingSnapshot.isPresent() ) {
latestObjectSnapshotsBuilder.put( idTime.getId(), matchingSnapshot.get() );
}
else {
logger.warn( "Failed to locate latest snapshot for {} {}", dao.getTableName(), idTime.getId() );
}
}
}
Both idTime.getId()
and obj.getId()
are defined to return a value of type ID
(generic type parameter to the method in question), and I had previously used this with an Integer
parameter. It worked correctly, and I hadn't thought much about whether the comparison above (obj.getId() == idTime.getId()
) was correct or not.
All of that changed today. I was doing some changes to the code where I also needed to use String
values as an ID
. This meant that obj.getId()
would now return a String
instead of an Integer
.
The problem: String
instances simply cannot be compared this way
And obviously (for those of you who know your Java by heart), this did not work. This is simple when you look at some example values in jshell
:
$ jshell
| Welcome to JShell -- Version 11.0.11
| For an introduction type: /help intro
S
jshell> String s1 = new String("foo")
s1 ==> "foo"
jshell> String s2 = new String("foo")
s2 ==> "foo"
jshell> s1 == s2
$3 ==> false
jshell> s1.equals(s2)
$4 ==> true
The reason is simple: these two strings refer to different object instances. The content equality check (s1.equals(s2)
) works, but ==
will perform a "referential equality" check, which will fail for String
objects. Other languages (C#, I'm jealously looking at you!) have overloaded the ==
operator for the String
class to do the least unexpected thing (i.e. check for "content equality"), but Java currently doesn't work that way. (at least not right now; Valhalla might fix this if we are lucky. More about this later.)
One gotcha: String
literals is the exception that confirms the rule
Note, however, that there are exceptions to the above rule. String literals with the same content are, interestingly enough, "interned" to refer to the same instance. This is explained in JLS 3.10.5. See this example for an illustration:
$ jshell
| Welcome to JShell -- Version 11.0.11
| For an introduction type: /help intro
jshell> String s1 = "foo";
s1 ==> "foo"
jshell> String s2 = "foo";
s2 ==> "foo"
jshell> s1 == s2
$3 ==> true
jshell> s1.equals(s2)
$4 ==> true
Both of these strings refer to the same String
instance. They can be compared using both referential equality (s1 == s2
) and value equality (s1.equals(s2)
).
How did this ever work with Integer
values in my use case?
This is now the $1,000,000 question...
How did this ever work with my previous Integer
parameter?!?
Well. There is again a significant exception to the rule, that can bite you really hard if you are unlucky. I'll start with the example from jshell
first, and then try to explain why it works like this:
Boxed small integers: refer to the same Integer
instance
$ jshell
| Welcome to JShell -- Version 11.0.11
| For an introduction type: /help intro
jshell> Integer i = 1;
i ==> 1
jshell> Integer j = 1;
j ==> 1
jshell> i == j
$3 ==> true
Boxed larger integers: does not refer to the same Integer
instance
jshell> Integer k = 1048576;
k ==> 1048576
jshell> Integer l = 1048576;
l ==> 1048576
jshell> k == l
$6 ==> false
Now, if the previously mentioned semantic details for how String
s are handled (s1 == s2
sometimes working and sometimes not) weren't enough, the above boxed Integer
weirdness should be more than enough to make a grown man cry...
Some more details around this can be found in the Integer.valueOf()
implementation. I can' say I know for sure, but I presume this is the method that gets called by the JVM whenever an int
(integer primitive) value is auto-boxed into an Integer
(full Java-object with object identity, wrapping an int
)
Interestingly enough, the code in the linked class goes as far as to say that this is actually required by the JLS for values between -128
and 127
. I presume it's a quite critical optimization in the JVM; the alternative (to always instantiate a new Integer
object every single time an int would be auto-boxed would likely lead to a huge performance impact both in terms of memory allocation and perhaps even more importantly, GC pressure).
Conclusions
Needless to say though, this is definitely something that can be a huge gotcha in the current Java version(s), and it's easy for people like me and probably others as well to make simple mistakes like this. It's perhaps even easier to make mistakes like this if you regularly work with other languages like C# which has the more "relaxed" semantics (where s1 == s2
will always work, regardless of whether s1
and s2
are string literals or dynamically constructed strings).
Luckily, the good people in the JDK project are actively working on improving this. Quoting the "State of Valhalla Part 2" design note linked below:
Many of the impediments to optimization that Valhalla seeks to remove center around unwanted object identity. The primitive wrapper classes have identity, but not only is this identity not directly useful, it can be a source of bugs. (For example, due to caching,
Integer
s can be accidentally compared correctly with==
just often enough that people keep doing it.)
Well, yes - this was exactly what happened to me. Thanks to Sebastian Lövdahl for sharing that link with me after I had cried out to him in agony today.