NEW: Join my free 100 Days of SwiftUI challenge today! >>

Avoiding near-duplicates in sets

Decide for yourself what makes two objects equal

Paul Hudson       @twostraws

Julian Schiavo writes, “I'm using a Set to make sure my array only contains uniques, but each object in the set has a date variable. This is causing duplicates when the data is refreshed and gets a different generation date. What would be the recommended solution for this?”

This is a great question, and Swift’s protocols help us make a really smart solution.

First, let’s take a look at some example code. Here’s an example NewsStory struct that tracks an ID, a title, plus a date:

struct NewsStory {
    var id: Int
    var title: String
    var date = Date()
}

As you can see, I’ve made the date automatically assigned so that it’s always the current date.

Using that struct we can create three news stories like this:

let story1 = NewsStory(id: 1, title: "What's new in Swift 5.1?")
let story2 = NewsStory(id: 2, title: "What's new in Swift 6.0?")
let story3 = NewsStory(id: 3, title: "What's new in Swift 6.1?")

Julian wants to store those news stories in a set instead of an array, which is a sensible choice. So, we want to write code like this:

var stories = Set<NewsStory>()
stories.insert(story1)
stories.insert(story2)
stories.insert(story3)
print(stories)

That creates a set of news stories, adds our three, then prints out the set. However, that code won’t compile: in order for Set to be able to identify each item uniquely we need to make NewsStory conform to Hashable so that it can generate a unique hash value representing the contents of each news story.

Swift is really smart here, because if add a Hashable conformance to a custom type that only has hashable properties it can do the rest of the work for us to calculate the hash value of our type. So, we need to update the NewsStory struct to this:

struct NewsStory: Hashable {
    var id: Int
    var title: String
    var date = Date()
}

And now our code works – so far, so good.

However, Julian’s problem arises when the following code happens next:

let story4 = NewsStory(id: 1, title: "What's new in Swift 5.1?")
stories.insert(story4)
print(stories)

That creates another NewsStory instance with the same ID and title as an existence story, adds it to the set, then prints out the contents of the set – and now you’ll see four news stories in there, even though one is a duplicate.

As I said earlier, when you add a Hashable conformance to a type that consists only of Hashable properties, Swift will do all the work required to calculate the hash value for us. The formula it uses is simple: get the hash value of all the properties in our type and combine them together.

So, what’s happening here is that we see two news stories as being the same because they have the same ID and title, but Swift sees them as being different because it’s also factoring in the date, and the date will be different by a tiny amount.

What we need to do is provide Swift with a custom equality rule – we need to say “these two stories are equal if their ID is the same, regardless of what their date is.”

To do that we need to implement two methods on NewsStory: one to generate a custom hash value, and one to check whether two news stories are identical.

The first method uses Swift’s Hasher struct to use only the ID of the news story to generate the hash value:

func hash(into hasher: inout Hasher) {
    hasher.combine(id)
}

The second uses operator overloading to implement a custom == comparison for news stories, that checks only that the IDs are equal:

static func ==(lhs: NewsStory, rhs: NewsStory) -> Bool {
    return lhs.id == rhs.id
}

And boom! We’re done. Even better, this new version will perform faster than Swift’s automatically synthesized methods, because we now only hash and check the actual unique value that matters, rather than all properties.

Of course, we used only the id property here, but you can hash or compare as many as you want to ensure your instances are different.

Here’s the finished code for the NewsStory struct:

struct NewsStory: Hashable {
    var id: Int
    var title: String
    var date = Date()

    func hash(into hasher: inout Hasher) {
        hasher.combine(id)
    }

    static func ==(lhs: NewsStory, rhs: NewsStory) -> Bool {
        return lhs.id == rhs.id
    }
}

Now, before we're done I want to add an important reminder from Rob Napier: “Equality implies substitutability – any two instances that compare equally can be used interchangeably in any code that depends on their values. If you’re only comparing id, then you mean “if two things have the same id, and other values are different, I never care which one I have and algorithms are always free to return me either.”

That last part is particularly important: when two objects are considered identical because our custom == method returns true, Swift is free to choose either. It might always choose the first, it might always choose the second, or it might choose a different one each time – and that behavior might change in future versions of Swift. Remember, we've told Swift that the objects are the same, so if the choice of object matters you've got a problem!

SPONSORED Instabug helps you identify and resolve severe crashes quickly. You can retrace in-app events and know exactly which line of code caused the crash along with environment details, network logs, repro steps, and the session profiler. Ask more questions or keep users up-to-date with in-app replies straight from your dashboard. Instabug takes data privacy seriously, so no one sees your data but you! See more detailed features comparison and try Instabug's crash reporting SDK for free.

BUY OUR BOOKS
Buy Pro Swift Buy Swift Design Patterns Buy Testing Swift Buy Hacking with iOS Buy Swift Coding Challenges Buy Swift on Sundays Volume One Buy Server-Side Swift (Vapor Edition) Buy Advanced iOS Volume One Buy Advanced iOS Volume Two Buy Advanced iOS Volume Three Buy Hacking with watchOS Buy Hacking with tvOS Buy Hacking with macOS Buy Dive Into SpriteKit Buy Swift in Sixty Seconds Buy Objective-C for Swift Developers Buy Server-Side Swift (Kitura Edition) Buy Beyond Code

About the author

Paul Hudson is the creator of Hacking with Swift, the most comprehensive series of Swift books in the world. He's also the editor of Swift Developer News, the maintainer of the Swift Knowledge Base, and a speaker at Swift events around the world. If you're curious you can learn more here.

Was this page useful? Let us know!