The Java Explorer

Tips and insights on Java

  • Subscribe

  • If you find this blog useful, please enter your email address to subscribe and receive notifications of new posts by email.

    Join 37 other followers

Posts Tagged ‘protecting classes’

Parameters and returned values

Posted by Eyal Schneider on February 11, 2010

When designing a class, we often deal with the question of how to protect against sloppy/malicious clients of our API. One of the major risks comes in the form of method parameters and returned values. References to these two offer a “backdoor” by which the client can mess with the class’ state in an uncontrolled manner.

The problem

To illustrate the problem, consider the following example, consisting of a movie collection managed by a video library.
A movie is a represented as simple bean:

public class Movie {
    private String title;
    private Genre genre;

    public Movie(){
    }

    public Movie(String title, Genre genre) {
        this.title = title;
        this.genre = genre;
    }

    public String getTitle() {
        return title;
    }

    public void setTitle(String title) {
        this.title = title;
    }

    public Genre getGenre() {
        return genre;
    }

    public void setGenre(Genre genre) {
        this.genre = genre;
    }

    public int hashCode(){
        return (title == null? 0 : title.hashCode()) ^ (genre == null? 0 : genre.hashCode());
    }

    public boolean equals(Object other){
        if (!(other instanceof Movie))
            return false;
        Movie otherMovie = (Movie)other;
        return (title != null? title.equals(otherMovie.title) : otherMovie.title == null) &&
                genre == otherMovie.genre;
    }
}

The video library class manages a set of movies, allowing efficient retrieval of all movies for a given genre:

public class VideoLibrary {
    private Set<Movie> movies = new HashSet<Movie>();
    private Map<Genre,Set<Movie>> moviesByGenre = new HashMap<Genre,Set<Movie>>();

    public void addMovie(Movie movie){
        if (movies.add(movie)){
            Genre genre = movie.getGenre();
            Set<Movie> genreMovies = moviesByGenre.get(genre);
            if (genreMovies == null){
                genreMovies = new HashSet<Movie>();
                moviesByGenre.put(genre, genreMovies);
            }
            genreMovies.add(movie);
        }
    }

    public Set<Movie> getByGenre(Genre genre){
        Set<Movie> movies = moviesByGenre.get(genre);
        if (movies == null)
            movies = Collections.emptySet();
        return movies;
    }
}

For simplicity, the implementation assumes a single genre per movie.
If we inspect the two data members and the logic associated with them, we can clearly see that they follow an “internal contract” (or class invariant):
1) The set movies and the union of all movie sets inside moviesByGenre are identical (i.e. contain the same set of movies).
2) For every pair (K,V) in moviesByGenre, all movies in V have the same genre K.
When this invariant is broken, the object is in an inconsistent state, and can’t respect the contract with its clients anymore.

Now, consider the following usage of the class:

    VideoLibrary lib = new VideoLibrary();
    Movie pulpFiction = new Movie("Pulp Fiction",Genre.CRIME);
    lib.addMovie(pulpFiction);
    pulpFiction.setGenre(Genre.COMEDY);
    //What will the next call return?
    Set<Movie> crimeMovies = lib.getByGenre(Genre.CRIME);

The fact that the caller has references to mutable objects inside lib allows him to modify the internal state of the library. In this example we violated item (2) of the invariant, ending up with an inconsistent library, where a movie tagged as a comedy is returned for a query on crime movies. Furthermore, we have also corrupted the movies data member, because we modified items in the HashSet after adding them (i.e. the HashSet invariant regarding the hash code of entries was also broken).
We can also cause damage by manipulating a returned value instead of a method parameter. In the following example we break both components of the invariant:

    Set<Movie> crimeMovies = lib.getByGenre(Genre.CRIME);
    crimeMovies.add(new Movie("The Big Lebowski",Genre.COMEDY));
    //What will the next call return?
    crimeMovies = lib.getByGenre(Genre.CRIME);

Note that in both examples we managed to ruin the library’s state without even using its API. We never violated its methods’ preconditions, and yet we have damaged it.
Generally speaking, changing an object’s state by actions external to its API is not necessarily a problem. When we use an ArrayList for example, we can freely mutate the state of the stored objects. So how do the ArrayList and VideoLibrary differ? They differ in their invariants. While ArrayList is indifferent to the contents of the objects added to it, VideoLibrary can’t tolerate changes to Movie objects stored in it, since they may violate the consistency of its data members.

Summing up, any class that exposes internal references through method parameters or returned values has a data integrity risk if the following two are true:
1) The exposed references belong to mutable classes
2) Mutation of these objects can break the invariant of the class

In C++, it is common to solve this problem using the const qualifier on parameters/returned values.  If we use it wisely, we can share an internal reference with the API user, without putting the data integrity at risk. Of course, the user of the API can still do the damage if he explicitly performs a cast to the non-const equivalent data type, hence the solution is not perfect.
Java’s final qualifier is merely equivalent to a constant pointer in C++ (e.g. char * const); unfortunately there is no Java equivalent to C++’s constant referent (e.g. const char *).
It is interesting to note that Java does have the const qualifier as a reserved keyword, but its use is illegal. I guess that the language designers wanted to have the option to introduce some const mechanism into a future version of Java…
Meanwhile, we have a variety of other techniques for protecting the data members of our classes. Following is a list of some of these techniques, with some examples from the video library use case.

Documentation

Maybe the easiest way of dealing with the problem is to return the responsibility back to the client of our class. This requires providing method documentation that clearly states what the client can’t do with the parameters/returned values.
The class java.util.HashMap for example takes this approach. When a client adds a pair to the map, he should make sure that the key is not altered as long as the pair is in the map. Otherwise, the hashcode of the key may change, and the map would not be able to guarantee correctness anymore. Unfortunately, the method HashMap.put(..) does not make this clear in its documentation…

In the case of the video library, we should add the proper documentation to VideoLibrary.addMovie(..) and VideoLibrary.getByGenre(..).

While this approach requires no coding and has no performance cost, it has 2 major disadvantages:

1) It requires programmers to be aware of “object ownership”. In our example, once we add a movie object to the library, the library becomes the “owner” of the movie object. The caller transfers the ownership and it is not allowed to mutate the movie anymore. Furthermore, the caller is not allowed to pass the movie to a method that assumes ownership on the parameter, and so on.
The ownership awareness is very similar to the way we manage object disposal in languages such as C++, where no garbage collector is available. The ownership in that case specifies who can dispose the object, rather than who has write permissions on it.

2) It is very easy for a programmer to mistakenly ignore the ownership rules, and once the “unauthorized” write occurs at runtime, the problem may manifest itself only in much later stage, possibly in unrelated parts of the code. This makes these kind of bugs very difficult to analyze.

Defensive copies

A very common approach is to hide internal references completely from the API user, by copying parameters right after receiving them, and copying returned values just before returning them. Cloneable objects make things easier. We will start by making Movie cloneable:

public class Movie implements Cloneable{
    ....
    public Movie clone(){
        try {
            return (Movie)super.clone(); // The default clone in our case is enough
        } catch (CloneNotSupportedException e) {
            // Unreachable
            return null;
        }
    }
}

Next we can fix VideoLibray.addMovie(..) as follows:

    public void addMovie(Movie movie){
        movie = movie.clone();
        ....
    }

The method getByGenre has to perform a deep clone. We want to protect both the collection and the items in it:

public Set<Movie> getByGenre(Genre genre){
    Set<Movie> movies = moviesByGenre.get(genre);
    if (movies == null)
        return Collections.emptySet();

    Set<Movie> res = new HashSet<Movie>();
    for (Movie movie : movies)
        res.add(movie.clone());
    return res;
}

Note that when returning an array reference from a method, there is no other choice but to perform a defensive copy. The other techniques described below are useless in the case of arrays.

While defensive copying guarantees a safe API, it carries a performance penalty, especially when working with large collections.

Immutability

When possible, we can make the classes of the parameters/returned values immutable. The main class still shares references with the outside world, but no one can modify their state, so we are safe. If there is a need to mutate such objects,  we simply derive new ones from existing ones (see java.lang.String or java.math.BigInteger for example).
In our example, an immutable version of Movie would be:

public final class Movie {
    private final String title;
    private final Genre genre;

    public Movie(String title, Genre genre) {
        this.title = title;
        this.genre = genre;
    }

    public String getTitle() {
        return title;
    }

    public Genre getGenre() {
        return genre;
    }

    public int hashCode(){
        ....
    }

    public boolean equals(Object other){
        ....
    }
}

Once Movie is immutable, we can leave addMovie(..) implementation unchanged. However, getByGenre(..) requires more attention – the concrete type being returned (HashSet) is mutable. Here we can do a shallow defensive copy:

public Set<Movie> getByGenre(Genre genre){
    Set<Movie> movies = moviesByGenre.get(genre);
    if (movies == null)
        movies = Collections.emptySet();
    return new HashSet<Movie>(movies);
}

Sometimes we don’t really need full immutability of the class used as parameter/returned value. We only need to make immutable the parts involved in our main class’ invariant. However, by making the class immutable we enjoy other benefits of immutable objects:

  • They are easy to analyze and debug
  • There is no need to implement clone()
  • They are always thread safe
  • Their hash code can be cached for improved performance

As a matter of fact, Joshua Bloch, in his famous book Effective Java, recommends always making classes immutable, unless there is a good reason why not to.

Decorator

The Decorator design pattern can be useful for the case of returned values. Instead of returning an internal reference, we return a wrapper that implements the same interface, but slightly modifies the behavior in order to protect our main class’ invariant.
One option of doing it is using a “restrictive decorator”, which disables all modifier methods by throwing exceptions when trying to use them.
The class java.util.Collections provides such decorators for sets, maps, lists, sorted maps, sorted sets and general collections. Assuming that we use the immutable version of Movie, the method getByGenre(..) can be re-written as follows:

public Set<Movie> getByGenre(Genre genre){
    Set<Movie> movies = moviesByGenre.get(genre);
    if (movies == null)
        movies = Collections.emptySet();
    return Collections.unmodifiableSet(movies); // Returns a read only view of movies
}

When applicable, we should prefer immutability, since the use of restrictive decorators provides a somewhat weaker protection; the protection is performed at runtime rather than at compile time. However, in comparison to the defensive copy technique which works in time proportional to the collection size, the decoration runs in constant time and is therefore a very efficient technique for collections as returned values.

As an alternative to a restrictive decorator, one can implement a decorator that allows write access, but performs the required data adjustments behind the scenes. In the case of Movie, we can implement an inner class for that purpose:

private class SafeMovie extends Movie{
    private Movie movie;

    public SafeMovie(Movie movie) {
        this.movie = movie;
    }
    public String getTitle(){
        return movie.getTitle();
    }
    public void setTitle(String title) {
        boolean removed = removeMovie(movie);
        movie.setTitle(title);
        if (removed)
            addMovie(movie);
    }
    public Genre getGenre() {
        return movie.getGenre();
    }
    public void setGenre(Genre genre) {
        boolean removed = removeMovie(movie);
        movie.setGenre(genre);
        if (removed)
            addMovie(movie);
    }
}

This code assumes the existence of a private method removeMovie(Movie) which completely removes a movie from the collection, and returns a flag indicating whether the movie existed in first place.
This wrapper can be used whenever returning movie objects from our public methods.

What about enumerations and iterators as returned values? While the obsolete Enumeration type is read only and safe to use as a returned type, Iterator has the optional ability to remove items (and ListIterator adds the ability to set and add items as well). Therefore, decoration is also relevant for iterators. The Apache Commons project includes a decorator for that purpose: org.apache.commons.collections.iterators.UnmodifiableIterator.

Read only interfaces

Sometimes when designing a class we separate the modifier and query operations into separate interfaces. Such classes are “friendly” as returned values, because we can simply return the “read only” interface instead of the class itself. While this is not completely safe because a malicious client can still cast the object to its modifiable version, it is a reasonable way of stating the intents, and preventing unintentional modifications by the client.

Listeners

In some cases, we want to share an internal reference with the client, and not restrict the possible actions on it. Instead, we want to make sure that whenever the referenced object is altered, our main class is adjusted accordingly, in order to maintain its invariant. To accomplish this, we can follow the Observer design pattern. The main class registers listeners in all internal objects it wants to share, and they in turn notify the main class whenever they change their state.
In the Java libraries, this technique is very common in Swing. The class JTable for example is a visual component based on a tabular data model (TableModel). This duality allows a clear separation between the data itself and its visual representation. The table model is a data member of JTable, and it is being shared with the outside world both as a returned value and as a constructor parameter. At construction time, the JTable registers itself as a listener for data changes in the model. Therefore, any changes done by the client to the data model itself will be reflected in the UI.

While this technique can be appropriate in some cases, it is not a generic solution, because it is not applicable in many cases:

  • While a mutation of the shared object maintains its own invariant, it is not necessarily a legal operation from the main class point of view; the listener can not always “fix” the invariant of the main class for any change of the shared object.
  • The main class has to be extra careful when it mutates the shared objects by itself, since this triggers the listeners as well. If it does so in the middle of a transaction, the class may be in a completely inconsistent state when the listeners are executed. This complicates the implementation of the listeners.

Summary

Protecting the state of a class becomes relevant only if it shares mutable references with its clients (through parameters/returned values), and mutations of these referenced objects have the potential of breaking the class invariant.
However, we should not always apply protection even under these circumstances. If we trust our clients, and the performance impact of any of the applicable protection techniques is unacceptable, we may want to leave the class unprotected, and add detailed documentation regarding the client’s responsibility.
Remember that API documentation is important anyway, even if we apply active protection. If we perform a defensive copy, it is nice to let the client know it and avoid another defensive copy on his part.  Similarly, if we return a decorated object, the client should be aware of the consequences of using the modifier methods.

What is the price of not protecting a class from reference inflicted damage? An unprotected class is never a module by itself. It cannot be tested as a stand-alone unit, and any bug in other classes can easily “infect” it. Systems built up of these kind of classes are difficult to debug, since it is harder to isolate problems and analyze them.

Lastly, it’s worth noting that a client can always ruin our class invariant if he really wants to. Although Java does not allow pointer manipulations which can easily break data integrity, it does provide the ability to cause a similar damage by means of reflection. Another way of breaking invariants can work on classes which are not thread safe – race conditions are likely to cause data corruption.

I would like to thank Dr. David Faitelson, my friend and tutor, for inspiring this blog post with his work on the theory of modularity.


Advertisements

Posted in Design, java | Tagged: , , , , , , , , , , , , , , , , , , | 3 Comments »