BLACK FRIDAY: Save 50% on all my Swift books and bundles! >>

< Back to Latest Articles

The pitfalls of string bridging

Swift's strings are designed to work flawlessly with languages around the world, but sometimes – just sometimes – you need to be careful using them. Let's explore why…

Watch the video here, or read the article below

If you want to follow along with this tutorial, create a new macOS project using the Command Line Tool template.

Quick links

Learning some geography

One of my favorite features of Swift's strings is that they have been designed for internationalization out of the box – they support English, French, German, Japanese, Chinese, and more. Some of the time this causes problems: some simple string manipulations that are easy in most languages are bizarrely hard in Swift.

The benefit is that when we need complex string manipulations – for example handling emoji – Swift makes them just as hard as easy manipulations, but in many other languages they are just broken because they expect strings to contain only ASCII characters.

However, Swift also builds on top of very old Objective-C libraries such as Apple's own Foundation framework, and this can cause all sorts of quirks and problems.

Let's look at some code. You're planning a roadtrip in North America, so you put some destinations into a string:

let vacation = "🇨🇦🇺🇸"

Those flags are emoji, but Swift manipulates them just like any other character. So, we can check whether our vacation includes Canada like this:

print(vacation.contains("🇨🇦"))

We can check whether our vacation includes the US like this:

print(vacation.contains("🇺🇸"))

And for a negative case, we can check whether our vacation includes Australia, like this:

print(vacation.contains("🇦🇺"))

So far, so good.

But let's have a little fun with our destination. We've just confirmed we definitely aren't going to Australia on our road trip, not least because it would be rather hard to drive to. What would happen, then, if we replaced Australia in our string with a different country?

We can try it out – we can replace Australia with Nicaragua, a country that's much more drivable:

print(vacation.replacingOccurrences(of: "🇦🇺", with: "🇳🇮"))

And now… our road trip contains China and Iceland?! Those definitely aren't drivable, but they also aren't Canada and the US, or even Australia or Nicaragua for that matter.

What gives?

The smart way to match strings

It's common to want to have some strings to compare, such as an array of names of books, and some search text provided by the user – you might want to filter the books to show only those that match the user's input.

For example:

let array = ["The Hobbit", "Lord of the Rings", "The Silmarillion"]
let search = "Sil"

let matches = array.filter { $0.contains(search) }
print(matches)

That will print one match, but it has a problem: most users don't really care about case sensitivity, and would expect the lowercase string "sil" to match as well.

The contains() method is case-sensitive, meaning that "sil" won't match "The Silmarillion", so sometimes you see code like this:

let matches = array.filter { $0.lowercased().contains(search.lowercased()) }

That's a little better, but it still has a problem: many languages have diacritics, such as the acute accent é in cafe or the ï sometimes written in naïve, and other languages have special characters in some localizations, such as the word "Straße" in Germany being written as "Strasse" in Switzerland because one country uses the Eszett character and the other doesn't.

So, this won't match:

let array = ["strasse"]
let search = "straße"
let matches = array.filter { $0.lowercased().contains(search.lowercased()) }

Swift doesn't contain code to match this kind of text, but Foundation does – it has a separate method called localizedStandardContains() that automatically ignores case, ignores diacritics, and automatically adapts to language variations such as the Eszett.

So, this code will work no matter which variant of German is being used:

let matches = array.filter { $0.localizedStandardContains(search) }

This localizedStandardContains() is described by Apple as being "the most appropriate method for doing user-level string searches," and in fact if you want to do string matching in SwiftData this is your only choice because at least right now it doesn't support lowercased() in its predicates.

You're probably wondering why I've started talking about hobbits and the German language, so let's loop back to a check from earlier:

print(vacation.contains("🇦🇺"))

Our vacation doesn't include Australia, because we're only going to Canada and the US. But what if we had used localizedStandardContains() instead? Let's try that:

print(vacation.localizedStandardContains("🇦🇺"))

And that's going to return true. You thought the location was restricted to North America, but Foundation clearly has other plans!

Going down to Foundation

What's happening here is a disconnect between Swift and Foundation. Remember, Swift's strings are all designed for internationalization, with support for emoji baked in, but Foundation isn't quite as good.

Foundation doesn't use Swift's strings, but instead relies on a much older type called NSString. NSString is used in all of Apple's Objective-C APIs, such as AppKit and UIKit, and so Apple worked a bit of magic: they made Swift silently convert its strings to NSString when working with Objective-C APIs, so we could set button titles in UIKit without lots of workarounds.

This is a process known as bridging, and although Apple adjusted how it works a little over time the essence is that Swift's strings gain a whole bunch of extra functionality for free – we see method calls on strings that actually get executed on an Objective-C NSString, with Swift silently converting to and from the two types for us.

There isn't really much logic for which parts belong to Foundation and which belong to Swift. For example, these are from Foundation:

print(vacation.capitalized)
print(vacation.components(separatedBy: "\n"))
print(vacation.replacingOccurrences(of: "🇦🇺", with: "🇳🇮"))

Whereas these are from Swift:

print(vacation.lowercased())
print(vacation.split(separator: "\n"))
print(vacation.replacing("🇦🇺", with: "🇳🇮"))

So, perhaps now you might be starting to see where our problem lies: I deliberately used replacingOccurrences(of:) in my example code, which silently converts our Swift string to and from an Objective-C NSString for its replacement.

That might already give you an idea of what the underlying problem is, but if not let's try two more lines of code:

print(vacation.count)
print((vacation as NSString).length)

That prints the string's length, but then explicitly converts the string to an NSString and prints the resulting length – it will print 2 for the Swift string, then 8 for the NSString.

A trip into Unicode

If you're still not sure what the problem is, consider this code:

let saying = "The rain in Spain falls mainly on the Spaniards"
print(saying[38])

That prints character 38 in the string, a capital S. Or at least it would if Swift supported reading characters inside strings using their integer offsets.

You see, that kind of code isn't allowed in Swift. It's not because it's tricky, because it's actually pretty easy with a small extension such as this one:

extension String {
    subscript(idx: Int) -> String {
        String(self[index(startIndex, offsetBy: idx)])
    }
}

However, Swift doesn't include such a thing by default because it's terrifically efficient: looking for character 38 means starting at startIndex and counting forward 38 places.

This is intentional, and explains why Swift's string and NSString report back different character counts: complex emoji are actually made up of smaller emoji combined. For example, the emoji 👩🏻‍❤️‍💋‍👩🏿 combines symbols for woman, light skin tone, a red heart, a kiss mark, and a dark skin tone, stitched together with a hidden character called a zero-width joiner.

Each of those are individual characters, so for Swift to read the whole character correctly it has no choice but to start at the beginning and count through each one step by step. If reading index 3 accidentally read out only half an emoji, it would cause all sorts of problems!

And now hopefully you can see the issue: our vacation contains the Canadian flag and the United States flag side by side, which are both emoji created from subsymbols called regional indicator symbols. For Canada that's regional indicator symbol C followed by regional indicator symbol A, "CA", and for the United State it's regional indicator symbol U followed by regional indicator symbol S, "US".

If we put them together, we get the regional indicator symbol CAUS, and – you guessed it – the Australian flag is made from the regional indicator symbols A and U. So, from Swift's perspective we have a Canadian flag followed by an American flag, but from Objective-C's perspective – and therefore from Foundation's perspective – we have four separate letters that could be Canada then the US, or could be a C, Australia, and S.

Now let's look at this line again:

print(vacation.replacingOccurrences(of: "🇦🇺", with: "🇳🇮"))

That's using Foundation API, so our vacation string will be treated as an NSString, and the hidden Australia flag will be replaced with Nicaragua's flag – AU will be replaced with NI.

That means our CAUS string will become CNIS, which in turn means suddenly we're visiting China (CN) and Iceland (IS).

So, the reason our code behaves badly is because Foundation's String APIs behave badly with Unicode. Yes, we took a bit of a roundabout route to get there, but I hope you learned something along the way!

Using this to our advantage

Before we're done, let's at least look at one way we could – in theory, at least! – use this behavior to our advantage.

I have two daughters, and they both enjoy watching women's sports. So, let's say we had a whole bunch of sports emoji like this:

let sports = ["🤼‍♀️", "🤸🏽‍♀️", "🚴‍♀️", "🏊🏿‍♀️", "🏊🏻‍♂️", "🏄🏻‍♀️", "🧗‍♂️"]

How could we know which of those feature women? Well, if we treated each of those as a Swift string there's not much we can do, but if we treated each of them as an NSString then we could take advantage of the fact that each of those emoji contains a separate symbol, ♀, which is used to mean "woman":

for sport in sports {
    if sport.localizedStandardContains("♀") {
        print(sport)
    }
}

That uses localizedStandardContains() because it decomposes the emoji – normally a bad behavior, but actually helpful here!

Now, I'm not saying that's a good idea, but I hope you've definitely learned something new!

If you liked this, you'd love Hacking with Swift+…

Here's just a sample of the other tutorials, with each one coming as an article to read and as a 4K Ultra HD video.

Find out more and subscribe here


Shadows and glows

19:50

SWIFTUI SPECIAL EFFECTS

FREE: Shadows and glows

SwiftUI gives us a modifier to make simple shadows, but if you want something more advanced such as inner shadows or glows, you need to do extra work. In this article I’ll show you how to get both those effects and more in a customizable, flexible way.

Introduction – please watch!

6:22

INSIDE SWIFT

FREE: Introduction – please watch!

The Inside Swift series is designed to explore Swift's own source code, so you can better understand how it works and also pick up techniques you can apply to your own code.

How to use phantom types in Swift

24:11

ADVANCED SWIFT

FREE: How to use phantom types in Swift

Phantom types are a powerful way to give the Swift compiler extra information about our code so that it can stop us from making mistakes. In this article I’m going to explain how they work and why you’d want them, as well as providing lots of hands-on examples you can try.

Creating a WaveView to draw smooth waveforms

32:08

CUSTOM SWIFTUI COMPONENTS

FREE: Creating a WaveView to draw smooth waveforms

In this article I’m going to walk you through building a WaveView with SwiftUI, allowing us to create beautiful waveform-like effects to bring your user interface to life.

Interview questions: Introduction

3:54

INTERVIEW QUESTIONS

FREE: Interview questions: Introduction

Getting ready for a job interview is tough work, so I’ve prepared a whole bunch of common questions and answers to help give you a jump start. But before you get into them, let me explain the plan in more detail…

Using memoization to speed up slow functions

36:18

HIGH-PERFORMANCE APPS

FREE: Using memoization to speed up slow functions

In this article you’ll learn how memoization can dramatically boost the performance of slow functions, and how easy Swift makes it thanks to its generics and closures.

Controlling views using the accelerometer

39:03

SWIFTUI SPECIAL EFFECTS

FREE: Controlling views using the accelerometer

Reading device motion and orientation is a fast and slightly magical way to incorporate the real world into your apps, and can do a huge amount to add a little spark of delight to your UI. In this article I’m going to show you how easy it is to control SwiftUI layouts using the accelerometer, and give you a few ideas for special effects.

Brush & Bark

2:16:15

LIVE STREAMS

FREE: Brush & Bark

In this stream we're going to build a website in Swift, using a free, open-source framework I produced called Ignite. It's designed to be familiar for SwiftUI developers, so hopefully you can see the appeal!

User-friendly network access

14:26

NETWORKING

FREE: User-friendly network access

Anyone can write Swift code to fetch network data, but much harder is knowing how to write code to do it respectfully. In this article we’ll look at building a considerate network stack, taking into account the user’s connection, preferences, and more.

Functional programming in Swift: Introduction

6:52

FUNCTIONAL PROGRAMMING

FREE: Functional programming in Swift: Introduction

Before you dive in to the first article in this course, I want to give you a brief overview of our goals, how the content is structured, as well as a rough idea of what you can expect to find.

Making the most of optionals

23:07

ADVANCED SWIFT

FREE: Making the most of optionals

Swift’s optionals are implemented as simple enums, with just a little compiler magic sprinkled around as syntactic sugar. However, they do much more than people realize, and in this article I’m going to demonstrate some of their power features that can really help you write better code – and blow your mind along the way.

Transforming data with map()

42:32

FUNCTIONAL PROGRAMMING

FREE: Transforming data with map()

In this article we’re going to look at the map() function, which transforms one thing into another thing. Along the way we’ll also be exploring some core concepts of functional programming, so if you read no other articles in this course at least read this one!

Understanding assertions

27:33

INTERMEDIATE SWIFT

FREE: Understanding assertions

Assertions allow us to have Swift silently check the state of our program at runtime, but if you want to get them right you need to understand some intricacies. In this article I’ll walk you through the five ways we can make assertions in Swift, and provide clear advice on which to use and when.

Creating a custom property wrapper using DynamicProperty

14:20

INTERMEDIATE SWIFTUI

FREE: Creating a custom property wrapper using DynamicProperty

It’s not hard to make a basic property wrapper, but if you want one that automatically updates the body property like @State you need to do some extra work. In this article I’ll show you exactly how it’s done, as we build a property wrapper capable of reading and writing documents from our app’s container.

Understanding generics – part 1

20:01

INTERMEDIATE SWIFT

FREE: Understanding generics – part 1

Generics are one of the most powerful features of Swift, allowing us to write code once and reuse it in many ways. In this article we’ll explore how they work, why adding constraints actually helps us write more code, and how generics help solve one of the biggest problems in Swift.

Ultimate Portfolio App: Introduction

11:03

ULTIMATE PORTFOLIO APP

FREE: Ultimate Portfolio App: Introduction

While I’m sure you’re keen to get started programming immediately, please give me a few minutes to outline the goals of this course and explain why it’s different from other courses I’ve written.

Trees

31:55

DATA STRUCTURES

FREE: Trees

Trees are an extraordinarily simple, extraordinarily useful data type, and in this article we’ll make a complete tree data type using Swift in just a few minutes. But rather than just stop there, we’re going to do something quite beautiful that I hope will blow your mind while teaching you something useful.

Actors and actor reentrancy

17:01

LEARN SOMETHING NEW

Actors and actor reentrancy

Swift's actors are an incredibly powerful way to schedule work safely, but they have a catch: actors allow reentrant code, which can cause surprising problems. Let's explore how actors work, and how reentrancy can catch you out…

Friendface, part 1

13:13

SOLUTIONS

Friendface, part 1

This challenge asks you to create an app that downloads and decodes JSON from the internet, then shows it in a list. Let’s tackle it now…

Checkpoint 9

6:31

SOLUTIONS

Checkpoint 9

Checkpoint 9 of Swift for Complete Beginners asks you to write a function to pick a number from an optional array, or return a random number if that’s not possible. Let’s solve that now…

 
Unknown user

You are not logged in

Log in or create account
 

Link copied to your pasteboard.