The Java Explorer

Tips and insights on Java

  • Subscribe

  • If you find this blog useful, please enter your email address to subscribe and receive notifications of new posts by email.

    Join 37 other followers

Comparators and sorted sets

Posted by Eyal Schneider on November 23, 2009


Java suggests two techniques for defining an ordering on a collection of objects. The first is implementing the Comparator interface, as a stand alone comparison mechanism. The second is making the objects comparable, by making their class/es implement the Comparable interface.  

When the ordering is inherent to the class nature, such as with class Integer, implementing the Comparable interface makes sense. In the other cases, we don’t want to add specific ordering logic to the class implementation. Instead, we implement comparators on a need to basis. We may also want to implement multiple comparators for the same class, thus having many ordering schemes.

Regardless of the interface we choose to implement (Comparable/Comparator), there are some important rules to follow. Suppose that we need a class for maintaining a set of points in the plane, ordered by their distance from a fixed point. Let’s first define a class for representing a point:    

public class Point{
    private int x;
    private int y;
    public Point(int x, int y) {
        this.x = x;
        this.y = y;
    }
    public int getX() {
        return x;
    }

    public int getY() {
        return y;
    }

    public boolean equals(Object other){
        if (!(other instanceof Point))
            return false;
        Point otherPoint = (Point) other;
        return x==otherPoint.x && y==otherPoint.y;
    }

    public int hashCode(){
        return x ^ y;
    }

    public double distanceFrom(Point other){
        int dx = x-other.x;
        int dy = y-other.y;
        return Math.sqrt(dx*dx+dy*dy);
    }

    public String toString(){
        return "("+x+","+y+")";
    }

 }

And now let’s define the OrderedPointSet class, which uses a TreeSet with a comparator for maintaining the sorted set of points efficiently: 


public class OrderedPointSet {
    private SortedSet<Point> points;
  private Point refPoint;
  public OrderedPointSet(final Point refPoint){
        this.refPoint = refPoint;
        points = new TreeSet<Point>(new Comparator<Point>(){
            public int compare(Point p1, Point p2) {
                double dist1 = refPoint.distanceFrom(p1);
                double dist2 = refPoint.distanceFrom(p2);    
                if (dist1>dist2)
                    return 1;
                if (dist2>dist1)
                    return -1;
                return 0;
            }
        });
    }
    public void add(Point p){
        points.add(p);
    }
    //returns the reference point followed by a list of points ordered in ascending distance from it

    public String toString(){
        return refPoint+":"+points.toString();
    }

    //Other methods
    ....
}

We can now test our implementation: 

OrderedPointSet ps = new OrderedPointSet(new Point(0,0));
ps.add(new Point(1,5));
ps.add(new Point(10,10));
ps.add(new Point(5,20));
ps.add(new Point(5,5));
System.out.println(ps);

The output contains all points, with their right order:  

(0,0):[(1,5), (5,5), (10,10), (5,20)]
 

Apparently the job is done. However, the implementation has a major flaw, that can be detected with the following test scenario:  

OrderedPointSet ps = new OrderedPointSet(new Point(0,0));
ps.add(new Point(5,10));
ps.add(new Point(2,11));
ps.add(new Point(5,5));
System.out.println(ps);

We expect to see 3 points in the output, but we see only two:   

(0,0):[(5,5), (5,10)] 

How did this happen? The first two points, though not equal, have the same distance from the reference point  (square root of 125). The TreeSet implementation (as well as other SortedSet implementations) never invokes the equals() method on its items, and relies only on the comparator implementation for determining equality. Therefore, from its point of view, points (5,10) and (2,11) are equal. When trying to add the latter, the collection stays unchanged, because it obeys the contract of Set.add(), which doesn’t permit overriding. As a consequence, another part of Set’s contract is broken: we add two distinct items (according to equals method), and yet one of them is rejected.
The problem originates from the comparator implementation above, which doesn’t define an ordering consistent with equals, as required by the SortedSet documentation. The formal definition is as follows:  

The ordering imposed by a comparator c on a set of elements S is said to be consistent with equals if and only if c.compare(e1, e2)==0 has the same boolean value as e1.equals(e2) for every e1 and e2 in S.        

Actually, for solving the problem described above it suffices to enforce only one direction of this rule: if two items are equal according to the comparator, they must be equal according to the equals() method. However, failing to comply with the other direction will result in a different violation of the Set interface –  the collection may accept two items despite the fact that they are equal according to equals().
Returning to our example, the fixed version of the comparator enforces consistency with equals by breaking distance ties of distinct points arbitrarily:

public int compare(Point p1, Point p2) {
    double dist1 = refPoint.distanceFrom(p1);
    double dist2 = refPoint.distanceFrom(p2);    
    if (dist1>dist2)
        return 1;    
    if (dist2>dist1)
        return -1;
 
    int dx = p1.getX()-p2.getX();
    if (dx != 0)
        return dx;
    return p1.getY()-p2.getY();
}

Proving that the consistency rule holds for this code is easy:

Let p1,p2 be points.

(Direction 1) Assume that p1.equals(p2). Then, p1.x=p2.x, and p1.y=p2.y. In this case, when running compare(p1,p2), conditions on lines 4 , 6 and 10 will not be met. The returned value is then p1.y-p2.y, which is 0, as requested.
(Direction 2) Assume that compare(p1,p2)=0. That means that the method must have terminated at line 12 (the other return statements never return 0). Therefore we know that p1.y=p2.y. Also, we know that condition on line 10 was not met, hence p1.x=p2.x. In other words, p1 and p2 are identical, so p1.equals(p2) is true, as requested.

Summing up, when implementing a Comparator (or Comparable)  interface intended to be used in some SortedSet implementation (or as a SortedMap key), make sure the comparator complies with the consistency with equals rule. This will guarantee that the collection respects the Set contract, and does not exhibit weird behaviours. After implementing the comparator, it may be wise to prove that the consistency property really holds.

Advertisements

5 Responses to “Comparators and sorted sets”

  1. Seva said

    The solution to use Y to compare points is not clean. Think of a class which extends Point adding Z property to it. Your compare() method will start to fail again.

    In my opinion it’s more generic to do this:

    public int compare(Point p1, Point p2) {
        double dist1 = refPoint.distanceFrom(p1);
        double dist2 = refPoint.distanceFrom(p2);    
        if (dist1>dist2)
            return 1;
        if (dist2>dist1)
            return -1;
        if(p1.equals(p2)) {
          return -1;
        }
        return 0;
    }
    
    • Eyal Schneider said

      Hi Seva :)

      1) I assume that you have a typo and you meant: “if(!p1.equals(p2))”
      Otherwise you still break the consistency with equals rule.

      2) Assuming so, your implementation is more generic, but fails to fully respect the compare(..) contract: (Taken from Comparator.compare Javadoc)
      “The implementor must ensure that sgn(compare(x, y)) == -sgn(compare(y, x)) for all x and y” .

      • Seva said

        1) Sure, this is a typo.
        2) Oh, right. Then there is no choice than using voodoo magic. :)

        public int compare(Point p1, Point p2) {
            double dist1 = refPoint.distanceFrom(p1);
            double dist2 = refPoint.distanceFrom(p2);
            if (dist1>dist2)
                return 1;
            if (dist2>dist1)
                return -1;
            if(!p1.equals(p2)) {
              return System.identityHashCode(p1)-System.identityHashCode(p2);
            }
            return 0;
        }
        
        • Eyal Schneider said

          Interesting try…
          But it seems to violate the consistency rule again. identityHashMap() is not guaranteed to be unique (and on 64 bit systems it can bever be). Consequently, you may return 0 for two non-equal points, which have the same distance from the reference point.

  2. […] Comparators and sorted sets […]

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

 
%d bloggers like this: