Wednesday, July 20, 2011

Removing entries from a Java Collection

Let's look at some simple code:
public static void main(String[] args) {
  Map<String, String> m = new HashMap<String, String>();
  m.put("A", "B"); m.put("C", "D"); m.put("E", "F");
  for (Map.Entry<String, String> e : m.entrySet()) {
    if (e.getValue().equals("D")) {
      m.remove(e.getKey());
    }
  }
} 
After setting up a map which contains three entries (A->B, C->D, E->F), it iterates over its entries and removes an entry, if it matches a certain condition.
Compile, Run, works perfectly (on my machine, with my JDK, etc.)

Now we just change the condition to:
if (e.getValue().equals("B")) {

Compile, Run, oops - we get a ConcurrentModificationException like this:
Exception in thread "main" java.util.ConcurrentModificationException
  at java.util.HashMap$HashIterator.nextEntry(HashMap.java:793)
  at java.util.HashMap$EntryIterator.next(HashMap.java:834)
  at java.util.HashMap$EntryIterator.next(HashMap.java:832)
  at prv.rli.codetest.CodeTest.main(CodeTest.java:11)

Why?
  • The for (x : y) statement is syntactic sugar for the old school collection iteration with an Iterator:
    Iterator<Map.Entry<String, String>> it = m.entrySet().iterator();
    while (it.hasNext()) {
    Map.Entry<String, String> e = it.next();
    }
  • An Iterator may throw a ConcurrentModificationException (a RuntimeException), if you change the collection it iterates over.



But why only with "B"?
  • Pure luck. The HashMap happens to be ordered like this: {E=F, A=B, C=D}, therefore the entry with value D is the last one and since the iterator will return false for hasNext(), we leave the loop, next() is no longer called and nothing happens.
So what is the correct way of removing an element from a map? There are several ways, which we will look at: Using the remove method of the iterator
Iterator<Map.Entry<String, String>> it = m.entrySet().iterator();
  while (it.hasNext()) {
    Map.Entry<String, String> e = it.next();
    if (e.getValue().equals("B")) {
      it.remove();
    }
  }
The Iterator interface provides a remove method, which will remove the element the last call to next() returned. Unfortunately the remove method may not be supported by some iterators. In this case the call to remove() would throw an UnsupportedOperationException. If we cannot use it.remove() we can also remember the keys of elements to remove and remove them after the iteration has completed:
  Set<String> toRemove = new HashSet<String>();
  for (Map.Entry<String, String> e : m.entrySet()) {
    if (e.getValue().equals("B")) {
      toRemove.add(e.getKey());
    }
  }
  m.keySet().removeAll(toRemove);
Note that we can use the for (x : y) notation here. Finally, if you remove most elements of a collection you can also create a new collection and only copy those elements you want to keep:
  Map<String, String> m2 = new HashMap<String, String>();
  for (Map.Entry<String, String> e : m.entrySet()) {
    if (!e.getValue().equals("B")) {
      m2.entrySet().add(e);
    }
  }
  m = m2;

This may have advantages in terms of memory consumption, because the new collection will not waste space for deleted entries.