The Java Explorer

Tips and insights on Java

  • Subscribe

  • If you find this blog useful, please enter your email address to subscribe and receive notifications of new posts by email.

    Join 36 other followers

Posts Tagged ‘Comparable’

Comparators – part 3

Posted by Eyal Schneider on April 30, 2012

In this post and in this one, I illustrate how a bad implementation of comparators can cause annoying bugs.
Lately I saw a strangely looking Comparator implementation, which looks like this:

class WeirdComparator implements Comparator<Long> {
    @Override
    public int compare(Long a, Long b) {
        if (a == null || b == null)
            return 0;
        if (a > b)
            return 1;
        if (b > a)
            return -1;
        return 0;
    }
}

The implementor decided not to rely on the natural ordering of Long, since it doesn’t handle null values, which are acceptable in his use-case.

The first two lines caught my eye: they are obviously wrong in terms of the Comparator contract, but it is not immediately clear what kind of damage they can cause. The contract of Comparator/Comparable states that the implementation must define a total order on all values of the type it handles. That means, among other things, the transitivity property:
If compare(b,a)>=0 and compare(c,b)>=0, then compare(c,a)>=0. With the current implementation, compare(null, 2L)>=0 and compare(1L, null)>=0, but compare (1L, 2L)<0.

What happens if we sort using this comparator?

    Long[] arr = {100L, 101L, 102L, 103L,
                  50L , null,  51L,  52L};
    Arrays.sort(arr,new WeirdComparator());
    System.out.println(Arrays.toString(arr));

The output in this case is a mess: “[50, 100, 101, 102, 103, null, 51, 52]”.

Java uses the Mergesort algorithm for sorting arrays of objects, and a variation of the Quicksort algorithm for sorting arrays of primitive types. Java’s Mergesort implementation has an optimization that defines a threshold of 7, such that any array range with a smaller size is not sorted recursively with Mergesort, but rather with Insertion sort.
The array size in the example is 8, so the two halves are sorted with Insertion Sort (leaving them unchanged), and then they are merged. In the merge phase, the value 50 is chosen first, and then the new head of the second half is null, which is defined to be equal to any value. This causes all the contents of the first half to be copied to the output.

The faulty comparator is also a disaster for sorted sets:

    SortedSet<Long> set = new TreeSet<Long>(new WeirdComparator());
    set.add(null);
    set.add(10L);
    set.add(20L);
    System.out.println(set);

Although we add 3 values, only the null value remains. As explained in my other post, TreeSet respects the Set invariants which prohibit duplicate values, and it uses the comparator to determine equality. Therefore, the values 10L and 20L are considered duplicates of null and aren’t added to the tree.

Posted in java | Tagged: , , , , , , , | 1 Comment »

Comparators – part 2

Posted by Eyal Schneider on May 19, 2010

In the post Comparators and sorted sets I discuss the “consistency with equals” rule, and the importance of following it when working with sorted sets.  Lately I saw a few cases of another faulty comparator implementation.  

As usual, I will demonstrate it with an example. Suppose we have a bank account class that encapsulates, among other things, the account id and the balance, in cents:  

 

public class Account {
    private int accountId;
    private int balanceCents;
 
    public Account(int accountId, int cents) {
        this.accountId = accountId;
        this.balanceCents = cents;
    }
 
    public int getBalanceCents() {
        return balanceCents;
    }
 
    public int getAccountId() {
        return accountId;
    }
 
    public String toString() {
        return accountId + ":" + balanceCents;
    }
}

We want to be able to sort accounts in ascending balance order, so we implement a comparator for that purpose:  

 

private class BalanceComparator implements Comparator<Account> {
    public int compare(Account o1, Account o2) {
        return o1.getBalanceCents() - o2.getBalanceCents();
    }
}

Since the compare method should return a negative integer, zero, or a positive integer as the first argument is less than, equal to, or greater than the second, we simply subtract the balances in order to determine which one is greater. 

Let’s test this implementation: 

 

Account[] items = {new Account(1,100),new Account(2,-200), new Account(3,1000),new Account(4,0)};
Arrays.sort(items, new BalanceComparator());
System.out.println(Arrays.toString(items));

The output is ordered correctly:

[2:-200, 4:0, 1:100, 3:1000]

However, there are some cases where Arrays.sort() fails to sort the array correctly. This happens with the following input array for example:


Account[] items = {new Account(1,2000000000),new Account(2,-2000000000), new Account(3,1000),new Account(4,0)};

Now the output is not as expected:

[1:2000000000, 2:-2000000000, 4:0, 3:1000]

What did we do wrong? The flaw lies in the comparator implementation. Although it is very tempting to use the elegant one-liner implementation instead of condition statements, it turns out to be wrong in our case. Like all numeric primitives, ints have a limited range (-2^31 to 2^31-1), so subtracting two values may result in an arithmetic overflow. When this happens, no runtime exception is thrown, and the returned value has a wrong magnitude and sign. For example, if we subtract the int -2147483648 from the int 2147483647 we get -1 (which is the unsigned interpretation of the correct result: 2^32-1). In the code above, a wrong sign means wrong decisions of the sorting algorithm.

If we know for sure that an int overflow is impossible (e.g. if the numbers being compared are always positive ints, or if we compare bytes), then we can use the one line comparison technique. In all other cases, we must use explicit conditions:

public int compare(Account o1, Account o2) {
    int balance1 = o1.getBalanceCents();
    int balance2 = o2.getBalanceCents();   
    if (balance1 > balance2)
        return 1;
    if (balance1 < balance2)
        return -1;
    return 0; 
}

Posted in java | Tagged: , , , , , , | 1 Comment »

Comparators and sorted sets

Posted by Eyal Schneider on November 23, 2009

Java suggests two techniques for defining an ordering on a collection of objects. The first is implementing the Comparator interface, as a stand alone comparison mechanism. The second is making the objects comparable, by making their class/es implement the Comparable interface.  

When the ordering is inherent to the class nature, such as with class Integer, implementing the Comparable interface makes sense. In the other cases, we don’t want to add specific ordering logic to the class implementation. Instead, we implement comparators on a need to basis. We may also want to implement multiple comparators for the same class, thus having many ordering schemes.

Regardless of the interface we choose to implement (Comparable/Comparator), there are some important rules to follow. Suppose that we need a class for maintaining a set of points in the plane, ordered by their distance from a fixed point. Let’s first define a class for representing a point:    

public class Point{
    private int x;
    private int y;
    public Point(int x, int y) {
        this.x = x;
        this.y = y;
    }
    public int getX() {
        return x;
    }

    public int getY() {
        return y;
    }

    public boolean equals(Object other){
        if (!(other instanceof Point))
            return false;
        Point otherPoint = (Point) other;
        return x==otherPoint.x && y==otherPoint.y;
    }

    public int hashCode(){
        return x ^ y;
    }

    public double distanceFrom(Point other){
        int dx = x-other.x;
        int dy = y-other.y;
        return Math.sqrt(dx*dx+dy*dy);
    }

    public String toString(){
        return "("+x+","+y+")";
    }

 }

And now let’s define the OrderedPointSet class, which uses a TreeSet with a comparator for maintaining the sorted set of points efficiently: 


public class OrderedPointSet {
    private SortedSet<Point> points;
  private Point refPoint;
  public OrderedPointSet(final Point refPoint){
        this.refPoint = refPoint;
        points = new TreeSet<Point>(new Comparator<Point>(){
            public int compare(Point p1, Point p2) {
                double dist1 = refPoint.distanceFrom(p1);
                double dist2 = refPoint.distanceFrom(p2);    
                if (dist1>dist2)
                    return 1;
                if (dist2>dist1)
                    return -1;
                return 0;
            }
        });
    }
    public void add(Point p){
        points.add(p);
    }
    //returns the reference point followed by a list of points ordered in ascending distance from it

    public String toString(){
        return refPoint+":"+points.toString();
    }

    //Other methods
    ....
}

We can now test our implementation: 

OrderedPointSet ps = new OrderedPointSet(new Point(0,0));
ps.add(new Point(1,5));
ps.add(new Point(10,10));
ps.add(new Point(5,20));
ps.add(new Point(5,5));
System.out.println(ps);

The output contains all points, with their right order:  

(0,0):[(1,5), (5,5), (10,10), (5,20)]
 

Apparently the job is done. However, the implementation has a major flaw, that can be detected with the following test scenario:  

OrderedPointSet ps = new OrderedPointSet(new Point(0,0));
ps.add(new Point(5,10));
ps.add(new Point(2,11));
ps.add(new Point(5,5));
System.out.println(ps);

We expect to see 3 points in the output, but we see only two:   

(0,0):[(5,5), (5,10)] 

How did this happen? The first two points, though not equal, have the same distance from the reference point  (square root of 125). The TreeSet implementation (as well as other SortedSet implementations) never invokes the equals() method on its items, and relies only on the comparator implementation for determining equality. Therefore, from its point of view, points (5,10) and (2,11) are equal. When trying to add the latter, the collection stays unchanged, because it obeys the contract of Set.add(), which doesn’t permit overriding. As a consequence, another part of Set’s contract is broken: we add two distinct items (according to equals method), and yet one of them is rejected.
The problem originates from the comparator implementation above, which doesn’t define an ordering consistent with equals, as required by the SortedSet documentation. The formal definition is as follows:  

The ordering imposed by a comparator c on a set of elements S is said to be consistent with equals if and only if c.compare(e1, e2)==0 has the same boolean value as e1.equals(e2) for every e1 and e2 in S.        

Actually, for solving the problem described above it suffices to enforce only one direction of this rule: if two items are equal according to the comparator, they must be equal according to the equals() method. However, failing to comply with the other direction will result in a different violation of the Set interface –  the collection may accept two items despite the fact that they are equal according to equals().
Returning to our example, the fixed version of the comparator enforces consistency with equals by breaking distance ties of distinct points arbitrarily:

public int compare(Point p1, Point p2) {
    double dist1 = refPoint.distanceFrom(p1);
    double dist2 = refPoint.distanceFrom(p2);    
    if (dist1>dist2)
        return 1;    
    if (dist2>dist1)
        return -1;
 
    int dx = p1.getX()-p2.getX();
    if (dx != 0)
        return dx;
    return p1.getY()-p2.getY();
}

Proving that the consistency rule holds for this code is easy:

Let p1,p2 be points.

(Direction 1) Assume that p1.equals(p2). Then, p1.x=p2.x, and p1.y=p2.y. In this case, when running compare(p1,p2), conditions on lines 4 , 6 and 10 will not be met. The returned value is then p1.y-p2.y, which is 0, as requested.
(Direction 2) Assume that compare(p1,p2)=0. That means that the method must have terminated at line 12 (the other return statements never return 0). Therefore we know that p1.y=p2.y. Also, we know that condition on line 10 was not met, hence p1.x=p2.x. In other words, p1 and p2 are identical, so p1.equals(p2) is true, as requested.

Summing up, when implementing a Comparator (or Comparable)  interface intended to be used in some SortedSet implementation (or as a SortedMap key), make sure the comparator complies with the consistency with equals rule. This will guarantee that the collection respects the Set contract, and does not exhibit weird behaviours. After implementing the comparator, it may be wise to prove that the consistency property really holds.

Posted in java, JDK packages | Tagged: , , , , , , | 5 Comments »