XP's Solilloquy - Haskell's function composition and performance

Tags: Programming, Haskell

Haskell is a pure functional programming language, therefore, function composition is inherent in the way we code in Haskell. And the Haskell platform libraries do provide a lot of functions to make life easier, you just combine them, in the right order, to create your solution, which is really neat.

For example, let’s take a look at the following code:

let n = 1000
let a = (map (* 2)) . (filter even) $ [1..n]
print $ length a

It takes a list of integers, filters out all odd numbers, and doubles all the even numbers (multiply each one of them by 2). Then, we just count of the numbers in the final list, and print out the result. The functions map, filter, even etc, are all provided in the standard library. As you can see, you can easily compose very complex solutions by chaining functions together, until you get the result you want. And this is the basic pattern of functional programming.

However, what’s wrong with that?

Nothing wrong obviously, but you really want to know how these functions are defined, and you want to be careful what kind of data you are working with, especially, if performance is important.

What if n in the code above is large, let’s say, 100 million? Then, when you run the code above, you get:

time ./test

real	0m8.775s
user	0m10.512s
sys	0m1.018s

It took more than 8.775 seconds! Why? Well, the code involves many many recursions, and many many list concatenation. In this case, the function filter is the killer. It takes a predicate and a list, and applies the predicate, recursively, to each element in the list, and concatenate back to a new resulting list. And we have not even looked at the operational cost of other functions yet.

Now, let’s see how the performance number would differ, if we just use normal list comprehension, another functional programming pattern which is akin to looping in other languages:

let n = 1000
let a = [x*2 | x <- [1..n], x `rem` 2 == 0]
print $ length a

And the execution time was:

time ./test

real	0m4.155s
user	0m4.886s
sys	0m0.500s

As you see, this is a lot faster than the previous code.

Conclusion

Function composition is great, and it makes your codes more readable and easier to understand, but just be careful about the data that you are applying these functions to. You really want to know how thet are defined, before you use them.

Does it mean that list comprehension, thus looping, will always be faster? Not necessarily. If you have tail recursion, there’s no overhead and the performance would be very similar to a loop. As a matter of fact, the code generated by the compiler would be almost the same as code generated from a normal loop.

Besides, in both examples above, we are running the programs with only one single core. In an era of multi-core programming, you really want to utilize all the CPUs and cores that you have. To take advantage of all these computing power, you want to divide the task into smaller tasks, and distribute the smaller tasks to all cores to compute in parallel. Dividing up a large task into smaller parallel tasks while performing list comprehension is going to be pretty hard, while it would be much easier if you use higher order function.

We will take a look at this issue in a future post.