Inside the Standard Library: Sequence.map()

Learn how the Swift Standard Library is built by coding it yourself.

Paul Hudson February 4th 2019 @twostraws

This is the first article in my new series, Inside the Standard Library. Here we'll be looking at the map() method on sequences, which transforms elements in a sequence using a function you specify.

This article is also available as a YouTube video

As this is the installment in the series, I want to take just one minute to explain how this series work, so please stick with me for a moment. Each article will be made up of three steps:

We’ll focus on one commonly used method from the Swift standard library, such as flatMap(), compactMap(), or reduce().
We’ll try to recreate the method from scratch in Swift, so you can see exactly how it works. My hope is that you'll see these things aren't special or magic, and are in fact quite understandable when you break them down.
Once we've coded our own naïve solution to the problem, we're going to look at the equivalent method from the Swift standard library – we'll look at how the Swift team solved the same problem, including what techniques and optimizations they used to make their version better. You'll see many of these techniques in more than one article, so hopefully you have lots of chance to learn them thoroughly.

So, if you're an intermediate Swift developer I hope you'll get a lot out of the first part where we code methods ourselves. And if you're an advanced developer, I hope you'll get a lot out of the second part where we dig into the standard library code. You don't need to read these articles in order, but chances are it will really help because I'll reference back to earlier articles as we progress.

If you want to follow along with the standard library code, make sure you clone the Swift repository. It's available at https://github.com/apple/swift

OK – enough chat. Let's get into some code…

Sponsor Hacking with Swift and reach the world's largest Swift community!

How `map()` works

Like I said the map() method on sequences transforms elements in a sequence using a function we specify. So, we could make an array of numbers like this:

let numbers = [1, 2, 3, 4, 5]

And double them using map():

let doubled = numbers.map { $0 * 2 }

Or make an array of strings like this:

let strings = ["Apple", "Banana", "Pear"]

And uppercase them like this:

let uppercased = strings.map { $0.uppercased() }

This is such a common method in the Swift standard library – I think it's a great place for us to begin. As you've seen, map() is able to transform any kind of data into any other kind of data, which means our solution is going to need to use generics.

I cover these in detail in my book Pro Swift, but the short version is this: generics let us use placeholders for data types in methods so that we can call those methods in different ways. You can call these placeholders whatever you want, but it's common to see the letters T, U, and V.

Coding our own `map()`

Let's start off simple…

extension Sequence {
    public func map2<T>(
        _ transform: (Element) -> T
    ) -> [T] {
        var result = [T]()

        for item in self {
            result.append(transform(item))
        }

        return result
    }
}

To test that it works, try modifying the two map() calls from earlier so they use map2(), and we should get exactly the same output.

Now, one of the great things about map() is that it works just as well with a throwing function. So, we're going to modify ours to handle that, which means marking the transformation function with throws and calling it using try, like this:

extension Sequence {
    public func map2<T>(
        _ transform: (Element) throws -> T
    ) -> [T] {
        var result = [T]()

        for item in self {
            result.append(try transform(item))
        }

        return result
    }
}

But now we have a problem: because our transformation function throws, our whole method needs to throw – or we need to handle the error internally.

It's definitely not a good idea to handle the error internally, because it's not our problem – whoever calls map should figure out how to handle errors that are thrown.

So, we could mark our map2() method as being throwing. But that creates a second problem: every time we call map2() we need to use do, try, and catch, even with functions that don't throw.

The right thing to do here is to mark the function as rethrows rather than throws. This means "if the function that gets passed in throws, then map2() throws, but if the function that gets passed in doesn't throw then map2() doesn't throw either:

extension Sequence {
    public func map2<T>(
        _ transform: (Element) throws -> T
    ) rethrows -> [T] {
        var result = [T]()

        for item in self {
            result.append(try transform(item))
        }

        return result
    }
}

How the standard library implements `map()`

That's enough for our solution – our code works just fine as a replacement for Swift's own map(): it's able to work with any kind of sequence, and it handles errors correctly. Now let's take a look at how the Swift standard library implements map.

All the standard library code is in stdlib/public/core, so start by changing directory into there. The map() method we're looking at is stored in Sequence.swift, so open that up in Xcode. Finally, press Cmd+F to search, look for func map, and you should see this:

public func map<T>(
    _ transform: (Element) throws -> T
) rethrows -> [T] {
    let initialCapacity = underestimatedCount
    var result = ContiguousArray<T>()
    result.reserveCapacity(initialCapacity)

    var iterator = self.makeIterator()

    // Add elements up to the initial capacity without checking for regrowth.
    for _ in 0..<initialCapacity {
        result.append(try transform(iterator.next()!))
    }
    // Add remaining elements, if any.
    while let element = iterator.next() {
        result.append(try transform(element))
    }
    return Array(result)
}

Immediately you can see they are also using rethrows for this method, so that it handles errors correctly. However, it has a few interesting differences from ours.

First, it uses Swift's ContiguousArray type, which is optimized for storing things consecutively in memory. This can be used here because it's about to transform every item in the sequence, and at the end of the method it actually gets converted back into a regular Array – you won't be using it over a long period of time.

Second, it calls reserveCapacity() on the array, which makes sure the array is able to store approximately enough elements for all transformations to be store. This is the kind of thing you should call as soon as you create your array, because it has a cost based on the number of items in the array.

Third, look at the way it loops over the sequence:

for _ in 0..<initialCapacity {
    result.append(try transform(iterator.next()!))
}
// Add remaining elements, if any.
while let element = iterator.next() {
    result.append(try transform(element))
}

Why doesn't Swift just use a for loop like we did? Well, the answer lies in the use of underestimatedCount. Some types such a strings make it expensive to calculate the actual size of the sequence, because each letter in the string might actually be some complicated Unicode symbol.

Below underestimatedCount represents a value that's definitely equal to or lower than the actual size of the array, so everything below underestimatedCount is guaranteed to be in the sequence. You can see that the code actually force unwraps a large chunk of its loop, because it already knows the element definitely exists. Only once it's past the underestimated size does it start being careful, because it might hit the end at any point.

So, using underestimatedCount rather than count is a neat optimization: it will help in some situations where the size of the sequence is known fully, but do nothing in others.

Wrap up

That wraps up our first article: we looked at map(), we looked at throws and rethrows, we looked at contiguous arrays, reserving capacity, iterators, and more – we've crammed in a lot, but if you're finding it a little hard to understand fully don't worry: many of these techniques are used in the other standard library code we'll look at, so there's lots of opportunity to come back to it.

Now it's over to you: what did you learn in this article, and how could you apply it to your own code? Tweet me @twostraws and let me know!

Sponsor Hacking with Swift and reach the world's largest Swift community!