There’s more than one way to map a cat

There are lots of data structures out there, ranging from primitive to so sophisticated that only a single person in the world understands them (these are, of course, mostly useless). The choice of the data structure is mostly specific to the problem; however, obviously some data structures are generally more popular/useful than others.

I’d say that the most generally useful data structure in the imperative world is an array. Arrays are as good as you can get if you do not need fast searching over large datasets – they have a simple efficient memory pattern (unless you need a multi-gigabyte array), leading to fast iteration, they have minimal meta data (per instance cost is zero), they can be indexed in guaranteed constant time, array transformations are trivially parallelizable, they generalize to multiple dimensions easily, etc.

However, in functional world, from my limited experience it looks like the favorite data structure is a consed list, which is better known as a singly-linked list. There is a type, which is called cons cell, which is a pair. A list consists of cons cells, which are linked together – the first element of a cons cell signifies the data element, the second one points to the next cell, or to the special object, nil, which represents an empty list:

(1 2 3 4) == (cons 1 (cons 2 (cons 3 (cons 4 nil))))

For some reason, a linked list is also the favorite data structure of our game logic programmers, though this has nothing to do with functional programming.

Consed lists are present in lots of functional programming languages; in some, they’re the only basic structure (Lisp/Scheme rely heavily on consed lists, even the code is composed of s-expressions which are stored in consed list form; Haskell strings are consed lists, etc.). However, for a programmer who spends half of his work day in a profiler, consed lists are a rather bad structure.

The benefits of consed lists are:

  • A new element can be added to the list in constant time, without mutation.

Not much, is it? The drawbacks are:

  • The memory access pattern is unpredictable and usually bad, leading to cache misses;
  • Even with a good allocator, at least half of the memory is used for meta data, increasing bandwidth;
  • Parallel processing is hard; if the processing function is relatively cheap, there will be no gains from parallelism;
  • While you can easily insert an element before the first one, you can’t easily append to the list. There is nothing pretty about non-symmetrical data structures;
  • You can’t easily remove an element from the list either;
  • Each cell is a heap-allocated object, which puts considerable pressure on garbage collector in GC’d environments.

There is a function, called map (mapcar in Common Lisp), which takes a consed list and a one-argument function, and returns a new list, which is produced by application of the function to all elements of the source list. This function, obviously, has O(N) time complexity and O(N) space requirements (after all, it generates a new list!). Looking at various possible implementations of this function gives some insight into problems of consed lists in particular and functional programming style in general.

My language of choice for today is F#, which is a multi-paradigm language (with both functional and object-oriented imperative elements) based on .NET platform. Of course, F# has a built-in consed lists, so we’ll start with them.

Naive recursive approach

When an imperative programmer has to implement a map function, his first natural reaction is to write a for loop. However, for loops usually have a mutable iterator, and thus are either discouraged or not present at all in functional languages. Instead, functional programmers like to use recursion. Indeed, a recursive implementation of map is straightforward, once you get used to recursive functions:

let rec mapcat1 pred lst =
    match lst with
    | car :: cdr -> (pred car) :: (mapcat1 pred cdr)
    | [] -> []

Aside from some syntactic weirdness (:: is used to make a cons cell, function arguments are specified without commas and without braces, I use pattern matching here instead of ifs), the code should be self-explanatory. Mapping an empty list produces an empty list; mapping anything else can be done recursively by making a cons cell with first element being a transformed first element of the original list, and the rest being a result of recursive map.

This function works, but it has a tiny problem – it’s recursive. Wait, what?

Tail-recursive approaches

You see, as you’ve been likely taught, recursion is bad. Each call to a recursive function pushes function arguments and the return address to the stack, creating a stack frame; a recursive function like map creates N stack frames for list of N elements, so it requires O(N) temporary memory. What’s worse, however, is that in some functional languages, like F#, the size of stack is bounded, so processing a large list is going to generate a StackOverflowException (Haskell, on the contrary, is happy to grow the stack until no address space is left in the process).

To solve this problem without loops, a concept of tail recursion is introduced. If the function call is the very last thing the function does, there is no need for additional memory – the old stack frame is not needed after the call, so the new frame can replace the old frame. For some languages the tail-recursion is an additional optimization (for example, some C++ compilers do it), for other it’s a spec requirement (i.e. a guaranteed feature).

There is a common recursive function transformation pattern, which is almost never done automatically by the compiler, but is usually easy to do by hand. Our map function can be transformed into this one:

let mapcat2 pred lst =
    let rec loop rest acc =
        match rest with
        | car :: cdr -> loop cdr (pred car :: acc)
        | [] -> acc
    loop lst []

There is an inner recursive function, which is conveniently named loop (which, I guess, shows my imperative background); the part of the list which is already built is stored in the acc(umulator) argument, which is updated at each call with the new element.

Note that loop cdr (pred car :: acc) is tail recursive, but (pred car) :: (loop pred cdr) is not – there is an extra cons operator after the function call.

The new version of the function works without stack overflows even on large inputs. However, due to the code structure change, it produces the list with the reversed order of elements, because we walk through the list and prepend the elements to the result, so the first element of the original list will be the last element of the new list.

Well, it’s easy – instead of prepending, we’ll append!

let mapcat3 pred lst =
    let rec loop rest acc =
        match rest with
        | car :: cdr -> loop cdr (acc @ [pred car])
        | [] -> acc
    loop lst []

I’ve changed :: to @, which is an append operator. Victory!

Wait, why is the program still running?

Remember I’ve said that consed lists are not symmetrical? You can easily prepend an element, but you don’t know where the last element of the list is, so append has to iterate through the entire left argument, making the new function quadratic. Oops. Note that this won’t be a problem with doubly linked lists, but, unfortunately, append was not a priority 50 years ago, when Lisp was accidentally created.

Ok, seems we’ll have to just reverse the result. A relatively common interview question is to code a single linked list reverse function. An inplace version is usually required, so that the old list gets destroyed in the reversing process; we can’t do that here, because consed cells, like all other objects in a pure functional world, are immutable – you can’t change them once they’re created. So we have to make an additional copy of the list (we can use the same map function for this, because it conveniently returns a transformed reversed list):

let mapcat4 pred lst =
    mapcat2 pred lst |> reverse

let mapcat5 pred lst =
    mapcat2 pred lst |> mapcat2 (fun x -> x)

|> is an F# pipeline operator, you can read it as “the left part’s expression result is passed as an additional argument to the right part”.

Now we finally have a correct solution; it is tail recursive, so it consumes O(1) stack frames, but it creates an additional copy of the list, so it still needs O(N) temporary memory. However, it works in F# for long lists.

Continuation-passing style

There is another workaround for the stack frame problem, which is very functional in spirit. What we need in order to build our list in the right order is to traverse through the list, remembering the chain of nodes, and then to unwind the chain starting from the last node. This is essentially what we do in a recursive approach, however we can stay tail recursive if the unwinding chain will be formed from continuations.

Think of it this way: we have to make a function, which, given a tail of the result, prepends another element to it, forming the next tail. If we have N such functions, and each is calling the next one, then the unwind chain will be formed from the function calls, which coincidentally happen to be tail recursive.

This is an example implementation (fun x -> expr is a way to create an anonymous function with argument x which evaluates expr as its result):

let mapcat6 pred lst =
    let rec loop rest cont =
        match rest with
        | car :: cdr -> loop cdr (fun acc -> cont (pred car :: acc))
        | [] -> cont []
    loop lst (fun x -> x)

We start with an identity function, which, given an argument x, returns it back. Then we form a function chain, which for a list [1; 2; 3] will look like this (in the order of creation):

fun x -> x
fun acc1 -> (fun x -> x) (pred 1 :: acc1) // this is equal to fun acc1 -> pred1 :: acc1
fun acc2 -> (fun acc1 -> pred 1 :: acc1) (pred 2 :: acc2) // this is equal to fun acc2 -> (pred 1) :: (pred 2) :: acc2
fun acc3 -> (fun acc2 -> (pred 1) :: (pred 2) :: acc2) (pred 3 :: acc3) // this is equal to fun acc3 -> (pred 1) :: (pred 2) :: (pred 3) :: acc3

And finally the result is called with an empty list, which results in (pred 1) :: (pred 2) :: (pred 3) :: [], which is what we want.

We’ve successfully traded N stack frames by N closure instances, hurray! (amazingly, this is faster than the original recursive version; see below for timings).

Imperative world

Timing the standard function, which does the same thing, showed that it’s way faster than the fastest of the above. The only way to optimize this, as far as I understand, is to introduce mutable data structures, which means introducing a special structure instead of the built-in cons cell (Scheme-aware readers can immediately recognize that Scheme has a built-in set-cdr! function, which is what we’ll need here).

The code is very much imperative, apart from the tail-recursion-instead-of-loops, so I’ll leave it without explanations:

type Cell =
    val car: int
    val mutable cdr: Cell

    new (car_, cdr_) = { car = car_; cdr = cdr_; }

let nil: Cell = Unchecked.defaultof<Cell>
let cons car cdr = Cell(car, cdr)

let mapcat2_mut pred lst =
    let rec loop (rest: Cell) (last: Cell) =
        if System.Object.ReferenceEquals(rest, null) then
            let cell = cons (pred nil
            last.cdr <- cell
            loop rest.cdr cell

    let head = Cell(0, nil)
    let res = loop lst head

Note that I always create a stub object which is thrown out; that’s because I’m lazy.


Now let’s stuff everything in a single program, add some timing code, and look at the results (the complete F# source is available here).

The program, when run, outputs the following:

"recursive" took 1.656725 ms
"tail-recursive (cons)" took 0.341175 ms
"tail-recursive (cons) + reverse" took 0.922769 ms
"tail-recursive (cons) + reverse (via map)" took 0.943074 ms
"tail-recursive (continuations)" took 1.110428 ms
"standard" took 0.496710 ms
"mutable recursive" took 1.430596 ms
"mutable tail-recursive" took 0.563062 ms
"mutable loop" took 0.577617 ms

So, as we can see, the fastest way is the function, which is closely followed by our mutable variant (there are both tail-recursive and loop versions here – F# has native support for loops); the next best are the functions which construct two lists, followed by the continuation version (amazing!), and, finally, by recursive version. The first tail-recursive variant is the fastest of them all, but it’s incorrect.

How did they do that? Why is as fast (well, it’s even 10% faster), than our mutable version, given that the F# list node is immutable? I’ve studied the F# assembly using ildasm, and found out, that…

… they mutate the resulting list. creates a head node from the first element of the list, and then calls mapToFreshConsTail, which creates the rest, and modifies the tail (cdr) of the cells in the process.

Conclusion: When purity and performance collapse, performance usually wins.

Oh, and using arrays here results in 0.1 ms runtime, which is 5x faster than the fastest list-based solution. Just saying.

This entry was posted in F#, Functional. Bookmark the permalink.

2 Responses to There’s more than one way to map a cat

  1. Spudd86 says:

    ‘course the REALLY REALLY good runtimes will sometimes use arrays and pretend they are cons lists, or mutate things if they can prove the original value is no longer needed (ie is garbage) In fact I believe it’s a fairly common thing in Scheme and Lisp runtimes for modern machines to do this (back in the days of the Lisp machine you didn’t have to of course… but back then a fetch from memory was cheaper than multiply…)

  2. There are functional programming languages that can do this sort of thing efficiently.

    For example, ATS ( has support for linear types. In the case you mentioned (the ‘map’ function) it is possible to mutate the resulting list, but it looks more “functional” in spirit.

    I won’t post the implementation here (it’s fairly large and too involved for first-time readers), but you can find, for example, the in-place reverse function example (search for “reversing a singly linked lists”). “Map” would be implemented fairly easily using a linear data type of singly-linked list segments, with all the mutation happening inside the function.

    It is my hope that future FPLs will be efficient as well as elegant, by adopting linear types or some sort of “separation types”, even.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s