The Java Explorer

Tips and insights on Java

  • Subscribe

  • If you find this blog useful, please enter your email address to subscribe and receive notifications of new posts by email.

    Join 37 other followers

Posts Tagged ‘bug’

Comparators – part 2

Posted by Eyal Schneider on May 19, 2010

In the post Comparators and sorted sets I discuss the “consistency with equals” rule, and the importance of following it when working with sorted sets.  Lately I saw a few cases of another faulty comparator implementation.  

As usual, I will demonstrate it with an example. Suppose we have a bank account class that encapsulates, among other things, the account id and the balance, in cents:  

 

public class Account {
    private int accountId;
    private int balanceCents;
 
    public Account(int accountId, int cents) {
        this.accountId = accountId;
        this.balanceCents = cents;
    }
 
    public int getBalanceCents() {
        return balanceCents;
    }
 
    public int getAccountId() {
        return accountId;
    }
 
    public String toString() {
        return accountId + ":" + balanceCents;
    }
}

We want to be able to sort accounts in ascending balance order, so we implement a comparator for that purpose:  

 

private class BalanceComparator implements Comparator<Account> {
    public int compare(Account o1, Account o2) {
        return o1.getBalanceCents() - o2.getBalanceCents();
    }
}

Since the compare method should return a negative integer, zero, or a positive integer as the first argument is less than, equal to, or greater than the second, we simply subtract the balances in order to determine which one is greater. 

Let’s test this implementation: 

 

Account[] items = {new Account(1,100),new Account(2,-200), new Account(3,1000),new Account(4,0)};
Arrays.sort(items, new BalanceComparator());
System.out.println(Arrays.toString(items));

The output is ordered correctly:

[2:-200, 4:0, 1:100, 3:1000]

However, there are some cases where Arrays.sort() fails to sort the array correctly. This happens with the following input array for example:


Account[] items = {new Account(1,2000000000),new Account(2,-2000000000), new Account(3,1000),new Account(4,0)};

Now the output is not as expected:

[1:2000000000, 2:-2000000000, 4:0, 3:1000]

What did we do wrong? The flaw lies in the comparator implementation. Although it is very tempting to use the elegant one-liner implementation instead of condition statements, it turns out to be wrong in our case. Like all numeric primitives, ints have a limited range (-2^31 to 2^31-1), so subtracting two values may result in an arithmetic overflow. When this happens, no runtime exception is thrown, and the returned value has a wrong magnitude and sign. For example, if we subtract the int -2147483648 from the int 2147483647 we get -1 (which is the unsigned interpretation of the correct result: 2^32-1). In the code above, a wrong sign means wrong decisions of the sorting algorithm.

If we know for sure that an int overflow is impossible (e.g. if the numbers being compared are always positive ints, or if we compare bytes), then we can use the one line comparison technique. In all other cases, we must use explicit conditions:

public int compare(Account o1, Account o2) {
    int balance1 = o1.getBalanceCents();
    int balance2 = o2.getBalanceCents();   
    if (balance1 > balance2)
        return 1;
    if (balance1 < balance2)
        return -1;
    return 0; 
}

Advertisements

Posted in java | Tagged: , , , , , , | 1 Comment »

Strings and Memory leaks

Posted by Eyal Schneider on October 27, 2009

java.lang.String is one of the mostly used classes in Java. Interestingly,  it is important to know some of its implementation details in order to avoid memory leaks.
Consider the following seemingly innocent piece of code:

import java.util.ArrayList;
import java.util.List;
import java.util.Random;

 

public class StrangeStringThing {
    private static Random rnd = new Random();
    private static final int MEGA = 1024*1024;
 
    private static String createRandomStr(int length){
        StringBuilder sb = new StringBuilder(length);
        for(int i=0;i<length;i++)
            sb.append((char)('a'+rnd.nextInt('z'-'a'+1)));
        return sb.toString();
    }
 
    public static void main(String[] args){
        List<String> substrings = new ArrayList<String>();
        for(int i=0;i<100;i++){
            String randomStr = createRandomStr(MEGA);
            String subStr = randomStr.substring(1000,1004);
            substrings.add(subStr);
        }
    }
}

This code generates a very long random string, and then keeps only a 5 chars long substring of it. This process is repeated 100 times.  We expect to end this program with 100 strings of length 5.
Surprisingly, if we try to run this code with the heap size limited to 100MB (-Xmx100m), we get an OutOfMemoryError.  How did that happen? all temporary large strings should have been cleared by the garbage collection, so there should be a really small memory footprint. Furthermore, the documentation of java.lang.String does not indicate any misuse of the API.

How are Java Strings stored?

The Java language specification defines precisely the API of class String, its Unicode details, its immutability, and the way String literals should be handled. Naturally, it does not define how Strings should be  represented internally, and what are the required time/space complexities of the different operations on them. These aren’t functional requirements. As a consequence, different JVM providers have the freedom to choose their preferred design for class String. Sun’s implementation (and also IBM’s) represent a string with a combination of 3 data members:

  • value – A reference to a char array
  • offset – A start index, inside the array
  • count – The actual length of the string.

When allocating a new string, the char array contains exactly the string content. Offset is set to 0, and count is set to the char array length. Then, when calling the method substring(..) upon it, the new string being returned contains a reference to the same char array, but its offset and count members are modified, reflecting the requested subsequence of chars. The sharing of the same char array by multiple String instances is possible, since strings are immutable. There are two benefits of this implementation approach in comparison to a substring implementation based on copying:
1) Memory usage is usually reduced, specially in cases where many substrings of the same string are taken, or if the substrings are long
2) substring(..) runs in constant time, instead of linear time

 Obviously there is a tradeoff here – if the string utilization pattern is not as described in (1), we may suffer from excessive memory consumption. The code above demonstrates this edge case: we take very small substrings of very large strings, where the latter are temporary. The temporary strings (referenced by randomStr) ARE collected by the garbage collector during the loop. However, their internal char arrays can not be collected, since they are being shared with the sub-strings. Therefore, the very long char arrays are kept unintentionally in memory, and that constitutes a memory leak.

The solution is simple – forcing the substrings to be copied into a new compact char array. This is done by replacing line 22 above with:

  String subStr = new String(randomStr.substring(1000,1004));

It looks like the API designers were aware of the the possibility of substring being implemented as described, so they added a copy constructor specifically for avoiding the memory leak. Note that otherwise this constructor is useless, because we are dealing with an immutable class.

Is this a Java bug?

There is currently an open bug report in Sun’s bug database regarding this issue.  It doesn’t look like it will be fixed soon, because doing so may introduce performance regression in many existing programs. Besides, there is a very simple workaround (described above), what makes the priority of this bug relatively low.

Summary

Java developers should be aware of the prevalent String.substring(..) implementation, and its memory leak potential risk. Whenever we know that the original string we take the substring from has a shorter life span than the substring itself, we should consider using the copy constructor to avoid sharing of the underlying char array.

Posted in java, JDK packages | Tagged: , , , , , , , | 5 Comments »