NEW: Nominations are now open for the 2019 Swift Community Awards! >>

How to extract text from a PDF using PDFKit

Swift version: 5.1

Paul Hudson    @twostraws   

PDFKit comes with a built-in class called PDFDocument, which allows us to load and parse PDF documents. It’s used when you want to put your PDF into a PDFView, but it’s also useful when you just want to read text from the PDF: you can loop over each page in the PDF, read its attributedString property, then append it to an attributed string containing all the text from the PDF.

Here’s some example code to do just that:

if let pdf = PDFDocument(url: url) {
    let pageCount = pdf.pageCount
    let documentContent = NSMutableAttributedString()

    for i in 1 ..< pageCount {
        guard let page = pdf.page(at: i) else { continue }
        guard let pageContent = page.attributedString else { continue }
        documentContent.append(pageContent)
    }
}

It’s an attributed string, so it will retain formatting from the PDF as best as it can.

SAVE 20% ON iOS CONF SG The largest iOS conference in Southeast Asia is back in Singapore for the 5th time in January 2020, now with two days of workshops plus two days of talks on SwiftUI, Combine, GraphQL, and more! Save a massive 20% on your tickets by clicking on this link.

Available from iOS 11.0 – learn more in my book Advanced iOS: Volume Two

Similar solutions…

About the Swift Knowledge Base

This is part of the Swift Knowledge Base, a free, searchable collection of solutions for common iOS questions.

MASTER SWIFT NOW
Buy Testing Swift Buy Practical iOS 12 Buy Pro Swift Buy Swift Design Patterns Buy Swift Coding Challenges Buy Server-Side Swift (Vapor Edition) Buy Server-Side Swift (Kitura Edition) Buy Hacking with macOS Buy Advanced iOS Volume One Buy Advanced iOS Volume Two Buy Hacking with watchOS Buy Hacking with tvOS Buy Hacking with Swift Buy Dive Into SpriteKit Buy Swift in Sixty Seconds Buy Objective-C for Swift Developers Buy Beyond Code

Was this page useful? Let us know!

Average rating: 2.7/5