The Java Explorer

Tips and insights on Java

  • Subscribe

  • If you find this blog useful, please enter your email address to subscribe and receive notifications of new posts by email.

    Join 37 other followers

Understanding Generics limitations

Posted by Eyal Schneider on August 14, 2009


Generics were introduced to Java in version 5 (JSR 14), and since then they became standard and very popular. Although there has been a great demand by developers for this feature, many are unsatisfied regarding HOW it was implemented. The design that was adopted has a huge impact on the syntax and semantics of this feature.

Java generics provides the following benefits:

  • Type safety
  • Less explicit casts
  • More declarative APIs

In this post I list some of the limitations of Generics. All of them are derived from two fundamental principles of the Java Generics design:

1) Erasure

Type parameters (e.g. String in  List<String>) are completely removed by the compiler. Formal type parameters (e.g. E in the source code of List) are replaced by their bound (usually Object). As a result, the bytecode has no generic information at all, consequently it is not available at runtime. Unlike C++, at runtime Java shares the same class for different parameterized instantiations of it. These facts have three interesting benefits:

1) JVM implementations prior to version 5 can theoretically stay unchanged in order to support Generics
2) When declaring a parameterized version of a class C, we don’t need the source code of the generic type C
3) The volume of the bytecode does not increase as we add more parameterized types of the same generic class

2) Type safety guarantee
According to Gilad Bracha, the Java Generics lead designer, if your entire application has been compiled without unchecked warnings, it is type safe. “Type safe” in this case means that there will be no unexpected class cast exception, i.e. if a ClassCastException occurs, then it must be caused by an explicit cast in the code. This principle is very important – we don’t want the implicit casts added when compiling generic code to raise runtime exceptions, since they would be hard to understand and fix. 

 

Formal type parameters usage

You can’t expect the formal type parameter to be used wherever a type name can fit. For example, instantiating an object from a formal type parameter is not allowed:

public class NonSense <T>{
    T obj = new T() ; //Illegal
}

Due to erasure, the compiler must generate code for NonSense class to be shared among all instantiations, but it doesn’t know anything about T at this point, and can’t assume anything.
The equivalent code in C++ IS allowed, since the compiler creates a dedicated class per parameterized type of NonSense.
Note that replacing the line above with List<T> obj = new ArrayList<T>() would compile, since the compiler generates code for creating an ArrayList, and the formal type parameter is erased.

For the same reasons, the following code does not compile:

public class Something <T> extends T{
}
 

Hierarchy of parameterized types

Two different instantiations (with different type parameter) of the same parameterized type don’t have any hierarchy relation. For example,  List<String> is not a List<Object>. The following code snippet illustrates what would have happened if this were allowed:

List<String> ls = new ArrayList<String>();
ls.add("one");
ls.add("two");
List<Object> lo = ls; //Illegal
lo.add(new Integer(3)); //The list ls is not homogeneous anymore!
 

Arrays

According to the intuition, arrays should follow similar rules as collections (such as ArrayList) do, when it comes to Generics. The reality is a little painful, however. It is not allowed to create arrays of elements which are parameterized types (e.g. new ArrayList<Strings>[10]). If it were allowed by the compiler, we could have done something like:
 

1.    ArrayList<Integer>[] arrOfIntLists = new ArrayList<Integer>[10]; //Illegal
2.    Object[] arrOfObj = arrOfIntLists;

3.    ArrayList<String> listOfStrings = new ArrayList<String>();
4.    listOfStrings.add("hello");
5.    arrOfObj[0] = listOfStrings;

6.    ArrayList<Integer> list = arrOfIntLists[0];
7.    Integer num = list.get(0);

What happens here?

1) An array of ArrayList of Integers is created.
2) arrOfObj now points to the same array. This is a legal up-cast, since any array is also an Object array. There is no warning in this line.
3,4,5) A list of Strings is created and added to arrOfObj. Note that the addition does not produce an ArrayStoreException, since the runtime type of the array components is simply ArrayList (Due to erasure), and that’s exactly what we are storing.
6) We retrieve the list at position 0 in arrOfIntLists. There is no ClassCastException yet because the implicit runtime cast is a cast to ArrayList, which is the correct class.
7) When trying to retrieve the first item of the list of integers, we fail with a ClassCastException, caused by the implicit cast to Integer.

If this code were legal, it means that the type safety guarantee is broken. The code compiles with no warnings, but at runtime it results in an unexpected ClassCastException in the last line. This happens because we managed to store a string list in an array of integer lists. 

So now we know why creating an ArrayList<Integer>[] is illegal.  But why does the compiler allow creating an instance of ArrayList<ArrayList<Integer>>? Shouldn’t it lead to the same type safety problem? The answer is no. If we convert the example above to the analogical version that uses ArrayList instead of arrays, the second line will not pass compilation, since ArrayList<ArrayList<Integer>> is not a subtype of ArrayList<Object> (See “Hierarchy of parameterized types” above).

The same phenomenon may occur with generic methods that receive generic arrays:

public class MyUtils{ 
    public static <T> void fill(T[] arr, T value){
        for(int i=0;i<arr.length;i++)
            arr[i]=value;
    } 
}

This mini-utility looks type safe, but it isn’t. Surprisingly, the following code compiles with no errors/warnings:

String[] strArr = new String[10];
fill(strArr,3);

Why does this happen? When a generic method is called this way, the compiler performs type inference for determining what type T stands for. In our case, this is done by inspecting the parameter types, and finding the lowest common type shared by them in the class hierarchy. With fill(strArr,”hi”), the type is obviously String. However, in the example above the most specific type shared by strArr and 3 (which is converted to an Integer) is Object. This causes the code to be legal, but to fail with ArrayStoreException on runtime. Technically, this is not a violation of the type safety guarantee explained in the introduction, because the guarantee deals only with ClassCastException.

There are two simple ways to avoid this problem. The first is to be more specific with the call:

MyUtils.<String>(strArr,3) //Now it does not compile

The second is to fix the method’s signature as follows:

public static <T,S extends T> void fill(T[] arr, S value) 
 

Latent Typing

In C++, the implementation of a parameterized type can assume anything about the type parameter/s. For example, the implementation of MyList<E> may call any method on E, and the availability of this method is checked against the actual type parameters of the different instantiations at compile time.  Java does not allow it.  Due to the generics design, the compiler has to generate the full bytecode of the generic class, regardless of the instantiations, so calls to methods that are not bound to any class/interface can not be compiled.

Therefore, in Java you are encouraged to declare explicitly all your assumptions on a type parameter. This is done using bounds. For example:

class DrawableSortedSet<T extends Comparable<T>,Drawable>{
....
}

This clearly makes the code more declarative, but proponents of Latent Typing argue that the extra declarations are not really needed for runtime type safety (C++ approach proves this), and also that the declarative approach may limit the programmer and force him to define new artificial interfaces containing the desired methods.  

 

Bounds restrictions

As explained before, bounds allow us to specify constraints on the type parameters being used, both in a generic class or a generic method. These prerequisites give us more freedom. For example, assume that we want a method that manipulates a list of numbers.  We start by the following generic method:

public void manipulate(List<Number> list){
  ....
}

The problem here is that List<Integer> can not be passed as parameter. Only List<Number> is allowed (see “Hierarchy of parameterized types” above). If we want to pass lists of any descendant class of Number, we should use bounds:

public <T extends Number> void manipulate(List<T> list){ 
    ....
}

Now we can pass List<Integer>, List<Float> and so on,  but we face a new limitation. While in the previous version of the method we could call list.add(new Integer(10)) or list.add(new Double(3.2)), now the compiler will not accept it, because it can’t guarantee that the addition maintains the type homogeneousness in the list. If addition of Integers were allowed, then calling the method with a variable declared as List<Float> may result with a new non Float item in the list, and this is a violation of the type safety guarantee.

 

Static data members

Generic classes are not allowed to have references to their type parameters from their static data members or methods. For example, the following is illegal:

public MyClass<T>{
    private static T value; //Illegal

    public static void analyze(List<T> list){ //Illegal
        ....
    }
}

C++ programmers who learn Java may be tempted to do it, because in C++ this is legal. Remember that unlike C++, Java uses the same class at runtime for MyClass<Integer>, MyClass<String> etc. Static members, by definition, are bound to the class rather than to the instance. It follows from these 2 facts that static members are actually shared by all instances of all generic instantiation of the same generic type. If the code above were allowed, type safety could not be guaranteed anymore. Imagine for example what happens when an instance of MyClass<Integer> stores an Integer in value, and then an instance of MyClass<String> retrieves it as a String.

 

Generic information at runtime 

Because of erasure, there is no sense in checking generic information at runtime. For example, the following is illegal:

if (list instanceof List<String>){ ....}

 

Primitive types as parameter types

Unlike C++, Java does not allow primitive types to be type parameters. This limitation is not so significant, since we can always replace the primitive type with the corresponding wrapper, and we can also use auto-boxing/auto-unboxing in the access methods to simplify the code. However, boxing/unboxing can lead to a performance penalty. In order to eliminate it there is no choice but to code a new, dedicated version of the class for the specific primitive type.

 

Generic exceptions

It is not allowed to define a generic type that extends (directly/indirectly) some Throwable. Trying to do so implies that the programmer wishes to handle different parameterized versions of the same exception at runtime, such as in the following example:

try{
    ....
}catch(MyException<Byte>){
    ....
}catch(MyException<Integer>){
    ....
} 

The problem is that erasure does not allow it.  There is no way for the JVM to distinguish at runtime between MyException<Byte> and MyException<Integer>. They are both simply MyException at runtime. Therefore there is no sense in having generic exceptions.

 

Generic enums

Defining generic enums is not allowed: 

enum State <T>{ //Illegal
    STANDING,WALKING,RUNNING;

     private T attribute;
     public void setAttribute(T attribute){
         this.attribute=attribute;
     }

     public T getAttribute(){
        return attribute;
    }
}

The reason for this to be illegal is related to the static members limitation (see “Static data members” above). Enums are implemented according to the “Type Safe Enum” design pattern, where the enum values are static data members of the class. Therefore, they are not allowed to have any reference to the class’ type parameter in their declarations.

 

Conclusion

Generics are a big step forward for Java, making it support a popular programming style common to Ada, C++, C# and other languages. However, simple intuition is not enough for really understanding Java generics. Sometimes one has to go under the hood of the compiler in order to understand its limitations and use it correctly. There has been a proposal of re-working the design by canceling erasure (“Reified Generics”), but it seems like this idea is no longer in the feature list of Java 7 (see http://tech.puredanger.com/java7).

 

Links

http://www.angelikalanger.com/GenericsFAQ/JavaGenericsFAQ.html
http://www.angelikalanger.com/GenericsFAQ/FAQSections/Fundamentals.html
http://mindview.net/WebLog/log-0051
http://java.sun.com/j2se/1.5/pdf/generics-tutorial.pdf
http://www.ibm.com/developerworks/java/library/j-jtp01255.html
http://www.devarticles.com/c/a/Java/Generics-and-Limitations-in-Java/
http://java.sun.com/developer/technicalArticles/J2SE/generics/#TJVM

Advertisements

3 Responses to “Understanding Generics limitations”

  1. […] Understanding Generics limitations […]

  2. honey said

    very good generic explaination major of my concept are now clear

  3. sivaram said

    Very good article

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

 
%d bloggers like this: