Whenever picking up a new programming language with any kinds of “objects” I typically want to know quickly how it deals with notions of “object equality”. If the language also has “dictionaries” then there could be related subtleties in using objects as safe hash keys. These aspects of a language can lead to rare but hard-to-debug problems if not understood properly. It turns out that in Julia there are some slightly novel twists on this classic topic.
Preview
- Replay a puzzling object comparison example from an earlier tutorial.
- Meet and understand three different “comparators” in Julia, including egal.
Some structs are more equal than others
In an earlier Julia typing tutorial I had the following example, repeated here with minor edits:
Compare
julia> struct Item a::Int; b::Int end
julia> i1 = Item(1, 2)
Item(1, 2)
julia> i2 = Item(1, 2)
Item(1, 2)
julia> i1 === i2 # identical
true
julia> i1 == i2 # equal
true
with
julia> mutable struct MutableItem a::Int; b::Int end # mutable version of 'Item'
julia> mi1 = MutableItem(1, 2)
MutableItem(1, 2)
julia> mi2 = MutableItem(1, 2)
MutableItem(1, 2)
julia> mi1 === mi2 # ok, not idential...
false
julia> mi1 == mi2 # also not equal???
false
The only difference between Item
and MutableItem
is the latter is a mutable variant of the former and yet that appears to change the results of both identity (===
) and equality (==
) queries. I don’t know about you, but the latter case, equality, was more surprising to me: why wouldn’t mi1 and mi2 be equal if i1 and i2 are – after all, the “object content” is the same in each case?
To get a better idea of what’s going on I needed to look carefully at Julia’s definitions of “identity” and “equality”.
Object identity or value equality?
Many languages with “objects” allow me to take memory addresses of objects and essentially consider such addresses as permanent “identities” of the said objects. Think of id()
in Python or &
in C++. In languages that have reasons to make it difficult to obtain objects’ addresses1, one could think of variables/struct fields as opaque “pointers” to objects. If I haven’t provided my own “oid” field for a struct, there really isn’t much else that can be done. Basically, such languages concede that “objects” must live somewhere in computer memory and don’t mind exposing addresses or other handles to their memory locations. Importantly, a replica of an object would necessarily have to live at a different address and hence would be “distinguishable” from the original. Many of us are so accustomed to this that we take it for granted.
Julia designers saw an opportunity to re-examine the entire concept of “equality” and they took it. Julia has (at least) two comparison operators, ===
and ==
. (They are also functions.)
===
is (non-overloadble) “egal”
Operator ===
implements the egal notion of equality described in Henry Baker’s seminal paper “Equal Rights for Functional Objects or, The More Things Change, The More They Are the Same”: x === y
is true if x
and y
are programmatically indistiguishable, i.e. it is impossible to write a program that demonstrates any differences between x
and y
. This operator is a practical implementation of equivalence relation in Julia, as emphasized by Unicode ≡
being a synonym for ===
:
julia> i1 ≡ i2 # equivalent
true
The reason object’s mutability becomes enmeshed in the concept of equality is because mutation is one option for telling objects apart: a program could try to mutate the value it sees via the x
“handle” and look for the same change as seen via y
: if the change is visible then x
and y
are really “the same thing”. Thus, mutable x
and y
would only be considered “egal” if they are truly pointers to the same address. On the other hand, if x
and y
are immutable and happen to point to the same data they are considered indistinguishable irrespective of whether they point to the same address under the covers2.
===
is a built-in operator that cannot be user-defined, therefore it must work for all possible types, built-in and custom. What should it do for composite types? As already discussed, if the objects being compared are immutable, their contents need to be compared and if the objects’ type is composite there may be a need to recurse into individual fields for a “deep” content comparison. The rules that cover all situations correctly are:
- for “plain data” types like
Int64
orFloat64
, compare them in a bitwise fashion; - for composite mutable
x
andy
, compare their memory addresses; - for composite immutable
x
andy
, compare their types first and if those are the same, apply===
recursively on their component fields.
This explains the difference between i1 === i2
and mi1 === mi2
above: it is all indeed due to the type mutability and is by design. And I think this also helps explain why in Julia type mutability is escalated to the level of type definitions.
In addition to higher conceptual clarity, Baker’s definition of equality enables potentially higher runtime performance: indistinguishable objects can be freely copied or shared across different expressions, different threads, etc, allowing compiler optimizations that otherwise would be inhibited by “aliasing”. To quote Baker:
If programming languages distinguish functional/immutable objects from non-functional/mutable objects, and if programs utilize a “mostly functional” style, then such programs will be efficient even in a non-shared-memory (“message-passing”) implementation.
This seems to be well in keeping with Julia’s somewhat-functional nature combined with lofty performance goals.
==
is (overloadable) “value equality”
Operator ==
, on the other hand, provides potentially user-customized notion of “intuitive equality”. For example, integer 1
and floating point 1.0
are perceived as numerically equal in many user contexts even though they are of different types:
julia> typeof(1), typeof(1.0)
(Int64, Float64)
julia> 1 == 1.0
true
In Julia this intuition extends to rational, complex, and other numbers:
julia> map(typeof, (1 + 0im, 1//1, BigInt(1)))
(Complex{Int64}, Rational{Int64}, BigInt)
julia> 1 == 1 + 0im == 1//1 == BigInt(1)
true
Such “across-types” equality is only allowed when it is mathematically exact, however:
julia> 1/3 == 1//3 # 0.3333... can never be exactly equal to "one third"
false
julia> 1/2 == 1//2 # 0.5 can be represented exactly in 64 bits of IEEE 754 format,
true
julia> 1/10 == 1//10 # ...but 0.1 can't be (since mantissa is not a sum of powers of 2)
false
Furthermore, numerical equality brings with it the usual IEEE 754 floating point complications:
julia> NaN === NaN # one NaN is like any other NaN,
true
julia> NaN == NaN # ... but NaNs don't equal to anything, even themselves
false
julia> 0.0 === -0.0 # positive and negative 0.0 are distinguishable in IEEE 754,
false
julia> 0.0 == -0.0 # ...but they are '=='-equal
true
Also unlike ==
, value equality isn’t guaranteed to always return Bool
. For example, if there is no “value” to compare to begin with:
julia> 1 == missing
missing
julia> missing == missing
missing
(for missing
operands, ==
implements three-valued logic, similar to NULL in SQL.)
Because “value equality” is inherently highly type-dependent, ==
may need to be customized (overloaded). It is not something that Julia can do automatically for every possible user type. What it does do, though, is provide default behavior that is very conservative and safe:
if not overloaded, default to
===
(see the definition in operators.jl):==(x, y) = x === y
for base types like
Int64
,Float64
,String
, etc provide overloads that support “intuitive” value equality as we are used to from other langauges: based on numerics, string content, etc.
And now the rest of the Item
/MutableItem
puzzle is clear: because ==
falls back to ===
it will do whatever “egal” concept must do, which in turn depends on whether the type is mutable. For Item
it happens to be what we would expect: recurse into and compare fields a
and b
. To make MutableItem
compare the same way I would need to add an explicit ==
overload myself3:
julia> import Base
julia> Base.:(==)(lhs ::MutableItem, rhs ::MutableItem) = (lhs.a == rhs.a) && (lhs.b == rhs.b)
julia> mi1 == mi2 # now equal
true
Oh, no! What is isequal()
?
An eagle-eyed reader of Julia documentation for ==
would have noticed mentions of something called isequal()
:
Use isequal or === to always get a Bool result.
It is not a “functionally named” equivalent of ==
as you might have thought (and I did at first, reacting to some subconscious stylistic similarities to the likes of isless() in C++). No, it is ==
“fixed” to deal with idiosyncrasies in ==
-comparison applied to floating point values (-0.0, NaN) and missing
:
Similar to ==, except for the treatment of floating point numbers and of missing values. isequal treats all floating-point NaN values as equal to each other, treats -0.0 as unequal to 0.0, and missing as equal to missing. Always returns a Bool value.
Why is this necessary? Primarily to be able to use types with floating point/missing
content as Dict
keys:
isequal is the comparison function used by hash tables (Dict). isequal(x,y) must imply that hash(x) == hash(y).
This typically means that types for which a custom == or isequal method exists must implement a corresponding hash method (and vice versa).
Dictionaries are ubiquituous in Julia and the language is designed so that anything4 could be used as a dictionary key. Having a hashtable comparator that might return missing
is unacceptable. Likewise unacceptable is a value that can never be equal to itself (NaN).
Unsurprisingly enough, isequal()
defaults to ==
except when encountering floating point/missing
values and in fact does not need to be overloaded unless the latter is a runtime possibility. The situation is exactly parallel to, say how Java needs to customize Object.equals() for Doubles – a practical, if not overly elegant, solution.
isequal()
calls ==
which in turn (if not overloaded) defaults to ===
.
Conclusion
Let me put behavioral traits for the three (so far) comparison operations in Julia together for easy comparison:
Operation | Overridable? | Synonyms | Return type | Fallback |
---|---|---|---|---|
isequal() |
yes | Bool |
== |
|
== |
yes | Bool or Missing |
=== |
|
=== |
no | ≡ |
Bool |
And of course, it is always important to remember the role played by type immutability.
- For example, because pointer arithmetic is “bad” or because something like a garbage collector needs to know when an object becomes unreferenced and eligible for destruction. ^
- Assuming, of course, that the language in question doesn’t expose other means of distinguishing memory locations, e.g. by figuring out their addresses. ^
- There is ongoing debate in the community whether Julia could provide such recursive overloads automatically in certain “safe” cases, but there is no universal agreement on what all such cases would be. ^
- Normally I would say that hashing floating point values is asking for trouble, but Julia thinks otherwise. ^