How to highlight text to speech words being read using AVSpeechSynthesizer

iOS has text-to-speech synthesis built right into the system, but even better is that it allows you to track when individual words are being spoken so that you can highlight the words on the screen. This is extremely easy to do thanks to the AVSpeechSynthesizerDelegate protocol: you get two callbacks in the form of willSpeakRangeOfSpeechString and didFinish, where you can do your work.

First, make sure you import AVFoundation into your project. Now make your class conform to the AVSpeechSynthesizerDelegate protocol.

Place a label into your view controller, then hook it up to an outlet called label. Now add these two methods:

func speechSynthesizer(_ synthesizer: AVSpeechSynthesizer, willSpeakRangeOfSpeechString characterRange: NSRange, utterance: AVSpeechUtterance) {
    let mutableAttributedString = NSMutableAttributedString(string: utterance.speechString)
    mutableAttributedString.addAttribute(.foregroundColor, value:, range: characterRange)
    label.attributedText = mutableAttributedString

func speechSynthesizer(_ synthesizer: AVSpeechSynthesizer, didFinish utterance: AVSpeechUtterance) {
    label.attributedText = NSAttributedString(string: utterance.speechString)

Finally, you need to trigger the text-to-speech engine – this might be by a button press perhaps, but it's down to you. Here's the method I attached to a button press:

@IBAction func speak(_ sender: AnyObject) {
    let string = label.text!
    let utterance = AVSpeechUtterance(string: string)
    utterance.voice = AVSpeechSynthesisVoice(language: "en-GB")

    let synthesizer = AVSpeechSynthesizer()
    synthesizer.delegate = self

