Time Profiler and System Trace to the rescue!
Part 3 in a series of tutorials on Instruments:
Previously we’ve seen how Instruments can help us identify slow drawing code and detect leaks by monitoring memory allocations, but in this final part we’re going to be looking at how to detect slow code using two different instruments: time profiler and system trace.
You don’t need to have completed the two previous parts of this tutorial, but you do need to download my example Xcode project from GitHub. I should repeat the large and important warning…
Warning: That project is specifically written to be bad – please don’t use it as a learning exercise, other than as a way to find and fix problems using Instruments!
Once you have the project, please press Cmd+I to build and run the app using Instruments. This time please choose the Time Profiler instrument, which records the functions that chew up the most time when your app runs.
Please press the record button now, which should launch the app and start recording. You don’t need to use the app this time – just let it have a couple of seconds after it finishes launching, then press Stop.
SAVE 50% All our books and bundles are half price for Black Friday, so you can take your Swift knowledge further without spending big! Get the Swift Power Pack to build your iOS career faster, get the Swift Platform Pack to builds apps for macOS, watchOS, and beyond, or get the Swift Plus Pack to learn advanced design patterns, testing skills, and more.
At the top of time profiler you should see a graph marked CPU Usage. This shows you how hard the CPUs inside your iPad are working, and for this app that means you’ll see a large spike at the beginning followed by almost no work at all once the app has finished launching.
On the bottom of your time profiler window, you should see two things: on the left a tree structure containing things like “Main Thread” and “_dispatch_worker_thread3”, and on the right a stack trace. Both of these are useful, but when you’re just starting out the view on the right is most useful: it shows the “heaviest stack trace” – of all the parts of your code, what single thing took the most time?
It’s nearly always a good idea to after these heavy stack traces when looking for places to optimize, because even a small change can yield a big benefit. Lines written in gray belong to Apple’s code, whereas lines written in black belong to your code either directly or indirectly.
In this case, you should be able to scroll down a little to see some black method names: viewDidLoad()
, generateThumbnail()
, and UIImage.__allocating_init(named:)
. The numbers to the left represent how many milliseconds were spent there – in my case I let my app run for 2.67 seconds, of which 2312 milliseconds (2.312 seconds) were spent inside the generateThumbnail()
method.
This is clearly a performance hotspot, so let’s take a look at the offending code from ViewController.swift:
for headline in headlines {
let components = headline.components(separatedBy: "|||")
generateThumbnail(title: components[0], imageName: components[1])
}
That calls generateThumbnail
for all available headlines in the app. To get an accurate fix on exactly how long this takes, you can use the CFAbsoluteTimeGetCurrent()
function, like this:
let start = CFAbsoluteTimeGetCurrent()
for headline in headlines {
let components = headline.components(separatedBy: "|||")
generateThumbnail(title: components[0], imageName: components[1])
}
let end = CFAbsoluteTimeGetCurrent() - start
print("Took \(end) seconds")
CFAbsoluteTimeGetCurrent()
returns the number of seconds that have passed since January 1st 2001 as a floating-point number. For my iPad Pro, that printed 2.359 seconds – this is our benchmark.
OperationQueue
The code we’re dealing with is designed to combine headline text and accompanying image into a new rendered image, along with a black box to make the text more readable. None of these images depend on the others, so this is what’s called an embarrassingly parallelizable problem – a problem that can simply be split across multiple CPU cores without any extra work.
We can make this happen using OperationQueue
: we can schedule each generateThumbnail()
operation there, and iOS will take care of running them as quickly as possible based on system resources. We don’t want the meaning of our code to change, though, so we’re going to add a call to waitUntilAllOperationsAreFinished()
so that the program won’t behave as it does right now – it won’t continue until all the headlines are generated.
Change the code to this:
let queue = OperationQueue()
for headline in headlines {
queue.addOperation {
let components = headline.components(separatedBy: "|||")
self.generateThumbnail(title: components[0], imageName: components[1])
}
}
queue.waitUntilAllOperationsAreFinished()
With that relatively simple change, iOS is able to start using all the CPU cores inside your device. So, go ahead and run Time Profiler again and see how we did…
The simple, single-threaded version of our headline generator ran in 2.359 seconds. However, this so-called optimized version runs in 6.385 seconds – it’s about a third the speed. Rather than getting faster, our change has made the code significantly slower – how come?
To find out, we’re going to use one of the most powerful parts of Instruments called System Trace. The time profiler instrument we were using previously monitors where your program is spending your time, which is usually enough. In comparison, the system trace instrument monitors how your code execution is doing – what each thread is doing, even when that’s just idling.
So, press Cmd+I to run Instruments again, select System Trace, then press Choose. When you press record now, you’ll see “Windowed mode” – this means System Trace records so much information that it only tracks a moving window of time rather than everything that happens.
So, wait for your app to finish launch, then press Stop to let Instruments analyze the logged data. It might take a minute or two – this instrument really does track a lot of information.
This can make it rather hard to navigate, so I want to walk through it step by step. At the top of the Instruments window, directly below where it says “Run 1 of 1” in the center, you’ll see four buttons: All, Instruments, Threads, and CPUs. These are called strategies and are different ways of analyzing your app’s performance.
Let’s start with the CPU strategy, so please click CPUs now. This will give you a graphical representation of how much work each CPU was doing – my iPad Pro has three high-performance CPU cores, so Instruments will show me activity across cores 0, 1, and 2.
From a glance it looks like these cores are busy, but if you press Cmd and + five or six times to zoom in what you’ll see is quite a different story: rather than each CPU core working at the same time, they juggle the work between them.
To find out why change to the Threads strategy, which will show you what all threads – including those outside your app – were doing during your execution – you’ll see a lot of information. To help focus on what matters, enter “instrument” in the Track Filter box in the top-left corner of the window. This project is called “HowToInstrument”, so by using “instrument” here we’re mostly going to show our app’s threads.
Back in the CPU strategy we zoomed in a long way so you could see how they weren’t really multitasking. You can zoom back out now, and if you scroll downwards you’ll see lots of threads seemingly working hard. Many of them are called “_dispatch_worker_thread_3” or similar – these are the ones being created by OperationQueue
to handle all our headline image generation.
Select one of those threads, then look in the divider between the chart on top and the details below. You should see something like “_dispatch_worker_thread_3 > Narrative” – I’d like you to click the word Narrative and change to “Summary: Thread States” instead. This breaks down exactly what your selected thread was doing during its lifetime.
For example, the thread I selected said this:
If you haven’t seen this before, let me break down what those mean:
So, my thread existed for 3.72 seconds, but for 3.51 of those seconds it was blocked - it couldn’t do anything. It actually only spent just 53.51ms running, which is hardly anything. If you look at other threads you’ll see the same story: lots of being blocked, and not much actual running.
We’re getting close to finding the problem, but Instruments can get us even closer – it can tell us exactly what caused the blocking behavior to happen. To see this in action, go back to the Narrative view by clicking “Summary: Thread States” and selecting “Narrative”. Now select any of the narrative events that start with “Blocked for…”, and look on the right side where it says Backtrace.
What you’re seeing here is the backtrace of function calls that led up to the thread being blocked. As with the time profiler, code in grey is Apple’s system code, and code in black is code you called either directly or indirectly.
You should see this, in reverse order:
From there upwards are grey lines such as [UIImage imageNamed:]
, [UIImageAsset imageWithTraitCollection:]
and so on. But eventually you’ll see one gray line that starts with “CUIMutableStructuredThemeStore”, followed by “objc_sync_enter” – “synchronization” is a fancy way of stopping multiple threads accessing the same data, which is what causes our threads to spend so much time blocking.
What we’ve seen is that the last meaningful code to run involves “CUIMutableStructuredThemeStore”, deep inside UIKit. That name might already have set off alarm bells in your head, but if not let me explain.
Lots of data types can be read safely from multiple threads. In Objective-C – the language used to write all this UIImage
code – the following items are all safe to use from multiple threads:
We call those thread safe. There are many more, but you get the idea.
On the other hand, these types are not thread-safe – they should not be accessed by multiple threads at the same time:
Spot the difference? None of the thread-safe types have “Mutable” in their name, whereas all the unsafe types do.
The problem here is mutability: if you have a mutable dictionary and two threads try to add data at the same time, where do the items go? What if they both try to write using the same key? We don’t know, because it’s not safe – that’s genuinely dangerous code.
When you see “Mutable” in the name “CUIMutableStructuredThemeStore”, it’s a signal this is not safe to access across many threads. As a result, UIKit will synchronize accesses to it so that only one thread can use it at any given time.
So, we’re blocking because our code ultimately calls “CUIMutableStructuredThemeStore”, and that blocks because it must only be accessed by one thread at a time. The question is, what is it?
Take a look at this code from ViewController.swift:
guard let image = UIImage(named: thumb) else {
fatalError("Unable to load thumbnail")
}
The UIImage(named:)
initializer loads images from your bundle, but it also caches image files that it loads, so if you request them again they load much faster. This can be a major performance optimization for commonly used assets, but it has a problem: being able to read and write data from a shared source means this isn’t thread safe – it’s a mutable theme store.
As a result, this “performance optimization” is causing havoc with our performance: UIImage(named:)
attempts to load each image from its cache, finds that another thread is doing the same thing, and waits, and waits, and waits – it’s a huge waste of time.
Now that we’ve isolated the problem, we can implement a fix. Change the above code to this:
let path = Bundle.main.path(forResource: thumb, ofType: nil)
guard let image = UIImage(contentsOfFile: path) else {
fatalError("Unable to load thumbnail")
}
That uses the contentsOfFile
initializer for UIImage
, which doesn’t have the caching mechanism in place. We’ve made one small change, so go ahead and run your code again to see if it worked.
All being well you should see a huge performance improvement. If you remember, the original code took 2.359 on my iPad Pro, the first attempt at OperationQueue
took 6.385 seconds, but with the new UIImage
change in place this code takes just 0.471 seconds to complete – amazing!
This was the third and last installment in this tutorial series on Instruments, and we’ve covered a lot:
This code is still a long way from great – again, it really wasn’t designed to be a nice project! – but I hope you’ve managed to make it better while learning a lot about Instruments along the way.
SAVE 50% All our books and bundles are half price for Black Friday, so you can take your Swift knowledge further without spending big! Get the Swift Power Pack to build your iOS career faster, get the Swift Platform Pack to builds apps for macOS, watchOS, and beyond, or get the Swift Plus Pack to learn advanced design patterns, testing skills, and more.
Link copied to your pasteboard.