Wednesday, November 17, 2010

Collection Performance - Don't become too lazy

Situation

A few weeks ago I was performance tuning some code that ran quite regularly and took more time than it should, judging from the complexity of what is was doing. As usual, by merely looking that the code there was nothing blatantly, obviously, complete and ridiculously wrong - after all, it worked correctly.

So I fired up trusty old JProfiler and had a look, just to find out that sometimes you can become just too lazy and reliant on libraries without realizing their characteristics deeply enough.

The Code

The following snippet shows the relevant method of the code under analysis. The surroundings are not really important, the heart of the matter is in here:

protected List<String> getCurrentChunkList() {
    List<String> all = getContext().getFullList();
    List<String> result = new ArrayList<String>(chunkSize + 2);
    int tempCount = 0;
    for (Iterator<String> allIter = all.iterator(); allIter.hasNext()
                && tempCount < chunkSize; tempCount++) {
        result.add(allIter.next());
        if (tempCount == chunkSize) {
            while (allIter.hasNext()) {
                String current = allIter.next();
                if (tCurrent.equalsIgnoreCase(result.get(result.size() - 1))) {
                    result.add(tCurrent);
                } else {
                    break;
                }
            }
        }
    }
    all.removeAll(result);
    return result;
}

Basically all this does is take a portion of a long list of Strings and transfer them to a partial list with some data specific fiddling at the end of a chunk. Once that has been completed, the elements transferred are removed from the primary list.

The whole thing is part of a batch processing system that is only allowed to run for a limited time slice at a time. So the overall list of items that still needs to be processed is stored persistently in a context object and a certain amount of those is extracted, the master list shortened and the resulting chunk returned to the actual processing in each run. Control is then returned to the surrounding framework that can assign a new time slice to this kind of job, or at different one at its discretion. This will be repeated until the master list is empty.

The all list contains about 10,000 elements, the chunk size is set to 1,000. Now, this does not look too bad, does it?

First run

When running the processing job for the mentioned 10,000 elements in chunks of 1,000 the whole process took 490 seconds; of that the above method accounted for a whopping 6,3% or 31 seconds! A little long considering that it is just calling a few supposedly well-optimized collection operations…

But looking at it more closely, these “few” operations are a few more. With 1,000 elements per chunk, there were

  • 10 invocations of getCurrentChunkList
  • 10,000 invocations of Iterator.hasNext
  • 10,000 invocations of Iterator.next
  • 10,000 invocations of ArrayList.add
  • 10 new ArrayLists
  • 10 new Iterators
  • 10 invocations of ArrayList.removeAll

Not exactly minimalistic, but 31 seconds seems a little over the top.

The problem lies with the ArrayList.removeAll call - internally an ArrayList is using a regular array as a backend store. The implementation of removeAll is just a looped call to the remove method for a single element. remove will then internally iterate the array and do an equality check of the element to be removed, finally removing it from the array when found.

Unless the element to be removed is at the end of the backing store array, there is a lot of bookkeeping to do, because removing something from the middle of an array is not an easy thing to do - in any case a good chunk of the array needs to be shifted over to fill the gap.

While this is using the fairly efficient System.arraycopy in its implementation, it still adds up, because we the algorithm shown above removes the elements it picks for processing from the beginning of the list, leading several thousand array shifting and resizing operations.

For reference, this is the relevant part from the source of ArrayList (Edit: from Apache Harmony!):

public E remove(int location) {
    E result;
    int size = lastIndex - firstIndex;
    if (0 <= location && location < size) {
        if (location == size - 1) {
            result = array[--lastIndex];
            array[lastIndex] = null;
        } else if (location == 0) {
            result = array[firstIndex];
            array[firstIndex++] = null;
        } else {
            int elementIndex = firstIndex + location;
            result = array[elementIndex];
            if (location < size / 2) {
                System.arraycopy(array, firstIndex, array, firstIndex + 1,
                                 location);
                array[firstIndex++] = null;
            } else {
                System.arraycopy(array, elementIndex + 1, array, elementIndex,
                                 size - location - 1);
                array[--lastIndex] = null;
            }
        }
        if (firstIndex == lastIndex) {
            firstIndex = lastIndex = 0;
        }
    } else {
        throw new IndexOutOfBoundsException(
            // luni.0A=Index: {0}, Size: {1}
            Messages.getString("luni.0A", //$NON-NLS-1
                Integer.valueOf(location),
                Integer.valueOf(lastIndex - firstIndex)));
    }
    modCount++;
    return result;
}

Edit: Once I had posted this - including the previous source snippet from the DocJar Website which I came across when looking for "ArrayList.java source", I found it should also have worked when removing elements from the front of the master list - but this did not work, when I tried it out on my machine. Turns out, I grabbed the Apache Harmony implementation by accident, which is implemented more efficiently than the one Sun's JDK6 includes. That's where the following snippet is from, which I had used in my profiling and that does not have the "remove from front optimisation" as the implementation above.

public E remove(int index) {
    RangeCheck(index);

    modCount++;
    E oldValue = (E) elementData[index];

    int numMoved = size - index - 1;
    if (numMoved > 0)
        System.arraycopy(elementData, index+1, elementData, index, numMoved);
    elementData[--size] = null; // Let gc do its work
    return oldValue;
}

Possible optimisations

As one can see from the source of ArrayList.remove (Sun JDK6) above, the ideal case would be to remove elements from the end of the array, because that does not require any element shifting and array copying whatsoever. The remove operation can simply be done by setting the last element to null and decrementing the lastIndex.

So I changed to code of the initial method to the following:

protected List<String> getCurrentChunkList() {
    List<String> all = getContext().getFullList();
    List<String> result = new ArrayList<String>(chunkSize + 2);
    int tempCount = 0;
    for (int i = all.size() - 1; i >= 0 && tempCount < chunkSize; i--, tempCount++) {
        result.add(all.remove(i));
    }
    return result;
}

This code removes strictly from the end of the all list and adds each element to the current result chunk. Going this route also allowed for the special fiddling in the middle of the old method to be removed, even though that did not contribute any measurable amount of overhead to the method’s execution time. The call to removeAll is completely gone, because remove already takes care of that individually.

Second run

Now, with the same data set the profiling comes to a very different conclusion:

  • 10 invocations of getCurrentChunkList
  • 10,000 invocations of ArrayList.remove
  • 10,000 invocations of ArrayList.add
  • 10 new ArrayLists

With this new configuration the percentage of overall runtime for this particular piece of code was reduced to a mere 23ms, less than 0,005% of the total execution time.

Conclusion

Even though the Java Collections framework’s classes are well tested and contain lots of performance enhancing shortcuts, it is still vital to be aware of their characteristics. Even though the original method looked perfectly ok at first glance and for a few runs would not show up on the radar, the repeated execution time added up to quite a remarkable percentage of the overall process.

With just a few minor modifications - that made the code easier to understand, too, getting rid of intermediate Iterators and special handlings for edge cases - that single method’s performance was boosted significantly. Considering that it is part of a more general batch job framework, overall improvements are even greater than what this single isolated test case showed.

5 comments:

Anonymous said...

Good to know.... Sometime

MrPotes said...

I'd be interested to know what the difference in timings would be if (using the original code) every time result.add was called, you also called allIter.remove(), rather than doing the removeAll at the end?

Daniel Schneller said...

Even though I did not try this, it would probably be the same - as removeAll simply iterates the passed collection and calls remove for each item, the amount of rearranging the backing store would be needed. Perhaps a subtle difference, depending on whether the iterator internally already knew the array index or not, maybe saving that lookup. But that lookup is generally hardly measurable in lists of that size.

MrPotes said...

Ultimately removeAll will be doing something along the lines of a nested iteration, depending on the underlying List implementation used. If allIter.remove is used, none of that repeated iteration is required. Should result in some improvement, although I'd agree probably not to the extent of using indexes.

DanielKr said...

Hi!

Interesting article. I'm guessing the original code is not copy/pasted since it appears to be broken (in addition to being very inefficient). Specifically, tempCount == chunkSize will never be true inside the loop, and also I suspect the references to "tCurrent" should really refer to the local declared as "current"?

Regards,
Daniel K