In praise and condemnation of the list comprehension

Python has group of related features (list comprehensions, dict comprehensions, generator expressions) which support special syntactic sugar for some basic sequence operations.

Here's the orignal design intent behind list comprehensions:

Rationale

List comprehensions provide a more concise way to create lists in situations where map() and filter() and/or nested loops would currently be used.

Let's see how it works out on a couple of examples:

Simple example: Remove odd numbers

Imperative loop

list result, eager

result = []
for num in range(20):
    if num % 2 == 0:
        result.append(num)

generator, lazy

def odd_numbers():
    for num in range(20):
        if num % 2 == 0:
            yield num

result = odd_numbers()

List comprehension

list comprehension, eager

result = [num for num in range(20) if num % 2 == 0]

generator expression, lazy

result = (num for num in range(20) if num % 2 == 0)

Using filter()

converted to list, eager

result = list(filter(lambda num: num % 2 == 0, range(20)))

lazy iterator

result = filter(lambda num: num % 2 == 0, range(20))

In this fairly simple example, the list comprehension appears to be the superior choice. It is concise, it reads nicely and we can easily turn the result into a lazy sequence by converting it to a generator expression.

Slightly more complicated example: Making some tasty apple treats

Here's a recipe which we'll try to express as a python program:

  1. start with a bunch of apples
  2. discard the bad apples
  3. peel away the stickers
  4. slice them up into pieces of the same size
  5. season with cinnamon and sugar
  6. wrap in laminated dough
  7. bake
  8. enjoy
apples = [
    {"type": "Granny Smith", "size": 2, "contains_worm": True, "has_sticker": True},
    {"type": "McIntosh", "size": 4,"contains_worm": False, "has_sticker": True},
    {"type": "Granny Smith", "size": 1, "contains_worm": False, "has_sticker": True},
    {"type": "Pink Crisp", "size": 3, "contains_worm": False, "has_sticker": True},
    {"type": "Pink Crisp", "size": 2, "contains_worm": True, "has_sticker": True},
]

def is_good(apple):
    return not apple["contains_worm"]

def peel_sticker(apple):
    return dict(apple, has_sticker=False)

def cut_into_pieces(apple):
    # use list comprehension only to create a collection of new records, 
    # not the main focus of this example
    return [dict(apple, size=1) for _ in range(apple["size"])]
    
def add_seasoning(piece):
    return dict(piece, seasoning=["cinnamon", "sugar"])
    
def wrap_in_dough(item):
    return {"baked": False, "content": [item]}
    
def bake(item):
    return dict(item, baked=True, delicious=True)

Let's start with the comprehension this time.

treats = [bake(wrap_in_dough(piece)) for piece in cut_into_pieces(peel_sticker(apple)) for apple in apples if is_good(apple)]

The same, but reformatted using black:

treats = [
    bake(wrap_in_dough(piece))
    for apple in apples
    for pieces in cut_into_pieces(peel_sticker(apple))
    if is_good(apple)
]

This is not so good anymore. Line by line, we go from talking about pieces, to apples, to pieces and apples and finally to only apples again. The steps involved in the process are disordered and intermingled with the compoments of the comprehension expression.

Compare that to a solution using filter, map and chain:

treats = list(
    map(bake,
        map(wrap_in_dough,
            chain.from_iterable(
                map(cut_into_pieces,
                    map(peel_sticker, 
                        filter(is_good, apples)
                    )
                )
            )
        )
    )
)

In this solution, each of the process steps are nicely separated, but the deep nesting of function calls is unpleasant. The thing which happens first in pre process (filtering good apples and peeling stickers), is the last thing we read in our code.

Finally, a collection assembled using an imperative loop.

treats = []
for apple in apples:
    if is_good(apple):
        for piece in cut_into_pieces(peel_sticker(apple)):
            treats.append(bake(wrap_in_dough(piece)))

This feels like the right way to go in python. We already removed some of the deep nesting and we can go even further if we invert the filtering condition. Because this is not just a single expression, we can also add some local variable assignments to make the whole thing a bit more tidy.

treats = []
for apple in apples:
    if not is_good(apple):
        continue
    
    clean_apple = peel_sticker(apple)
    for piece in in_cut_into_pieces(clean_apple):
        raw_treat = wrap_in_dough(piece)
        baked_treat = bake(raw_treat)
        treats.append(baked_treat)

Perhaps those variable assignments aren't super necessary here, but you get the point. There is a lot of flexibility in what we can do inside of a regular for loop as opposed to a list comprehension expression.

But something got lost in this solution. We started with a simple list of steps in our recipe, and we can probably imagine adding or altering some of the steps in isolation - perhaps we receive apples in boxes or baskets and we must first unpack them, or we're tasked with producing batches of treats always from the same type of apple, so an additional grouping step would be required, or maybe we want to bake all the treats at once instead of one at a time. In our recipe, such changes could be expressed by simply adding another step, but in the imperative loop solution we might have to change the whole structure of the program to accommodate it.

Other languages

For perspective, it may be useful to look at how other languages tackle this kind of issue.

Clojure for example has threading macros which let us compose a sequence of operations without heavy nesting. ->> performs a transformation of the forms within the body of the macro call, and it can be used with any function, not only map, filter and mapcat.

(def treats
  (->> apples
       (filter is-good?)
       (map peel-sticker)
       (mapcat cut-into-pieces)
       (map wrap-in-dough)
       (map bake)))

A similar feature in the form of the pipeline operator can be found in F# , Elm and others.

treats =
  apples
    |> List.filter isGood
    |> List.map peelSticker
    |> List.concatMap cutIntoPieces
    |> List.map wrapInDough
    |> List.map bake

This is only syntactic sugar which allows us rearrange the function calls so they're in the order in which the data passes through them. It allows for easier composability than deep nesting and also reads nicer.

JavaScript can handle such cases with method chaining:

let treats = apples
  .filter(isGood)
  .map(peelSticker)
  .map(cutIntoPieces)
  .reduce((left, right) => left.concat(right))
  .map(wrapInDough)
  .map(bake);

Conclusion

In my opinion, higher order collection operations such as map, filter and chain.from_iterable (mapcat) are fundamental in construction of programs. They should have first class support in the languages we use, the same way as numeric operators or string manipulation methods.

There are plenty of examples of such functionality in other languages, which I believe demostrates that programmers have a need for it.

It is a shame that python doesn't give us strong tools to combine and compose operations in this way. Outside of very simple situations, the list comprehension does more to obfuscate the solution than to improve it.

We often have to fall back to manually assembling the result in an imperative loop. The collection transformations are already conceptually clean and they compose well, so it is unfortunate we have to take them apart and fall back to a lower-level language feature to express them.