UPGRADE YOUR SKILLS: Learn advanced Swift and SwiftUI on Hacking with Swift+! >>

XML description not returned

Forums > SwiftUI

I've got this example code

import Foundation

class XMLNode {
    let tag: String
    var data: String
    let attributes: [String: String]
    var childNodes: [XMLNode]

    init(tag: String, data: String, attributes: [String: String], childNodes: [XMLNode]) {
        self.tag = tag
        self.data = data
        self.attributes = attributes
        self.childNodes = childNodes
    }

    func getAttribute(_ name: String) -> String? {
        attributes[name]
    }

    func getElementsByTagName(_ name: String) -> [XMLNode] {
        var results = [XMLNode]()

        for node in childNodes {
             if node.tag == name {
                results.append(node)
            }

            results += node.getElementsByTagName(name)
        }
        return results
    }
}

class MicroDom: NSObject, XMLParserDelegate {
    private let parser: XMLParser
    private var stack = [XMLNode]()
    private var tree: XMLNode?

    init(data: Data) {
        parser = XMLParser(data: data)
        super.init()
        parser.delegate = self
    }

    func parse() -> XMLNode? {
        parser.parse()

        guard parser.parserError == nil else {
            return nil
        }
        return tree
    }

    func parser(_ parser: XMLParser, didStartElement elementName: String, namespaceURI: String?, qualifiedName qName: String?, attributes attributeDict: [String: String] = [:]) {
        let node = XMLNode(tag: elementName, data: "", attributes: attributeDict, childNodes: [])
        stack.append(node)
    }

    func parser(_ parser: XMLParser, didEndElement elementName: String, namespaceURI: String?, qualifiedName qName: String?) {
        let lastElement = stack.removeLast()

        if let last = stack.last {
            last.childNodes += [lastElement]
        } else {
            tree = lastElement
        }
    }

    func parser(_ parser: XMLParser, foundCharacters string: String) {
        stack.last?.data = string
    }

}

// EXAMPLE CODE to call the xml parser
//

guard let url = URL(string: "http://www.boardgamegeek.com/xmlapi/boardgame/285") else { fatalError("Invalid URL") }
do {
    let contents = try String(contentsOf: url)
    let dom = MicroDom(data: Data(contents.utf8))
    let tree = dom.parse()
    //print(tree?.tag ?? "")
    let search = "description"
    if let tags = tree?.getElementsByTagName(search) {
        //print(tags)
        for tag in tags {
            print(tag.data)
        }
    }
    //print(contents)
} catch {
    print("error loading")
}

If I search for the element "name", I get the correct return value. If I search for an element that does not exist, I get nothing (expected). If I search for the element "description" I get close chevron (>) returned. Searching for other elements that do exist, seems to work as expected. Anyone got any idea why the search for "description" does something different?

2      

What does your XML data look like?

2      

<boardgames termsofuse="https://boardgamegeek.com/xmlapi/termsofuse">
                    <boardgame objectid="285">
            <yearpublished>1998</yearpublished>
            <minplayers>1</minplayers>
            <maxplayers>4</maxplayers>
            <playingtime>90</playingtime>
            <minplaytime>90</minplaytime>
            <maxplaytime>90</maxplaytime>
            <age>10</age>

                            <name primary="true" sortindex="1">Monsters Ravage America</name>

            <description>Each player is one of six Giant Monsters, stomping across a map of the USA in search of cities to destroy. The monsters run the gamut from the classic (enormous lizards and gargantuan apes) to the slightly more unconventional (50-foot glow worms, anyone?), and each has its own set of attributes and powers.&lt;br/&gt;&lt;br/&gt;Ingeniously, everyone also controls an arm of the military, which can be used to attack and weaken the other monsters on the board. It's a losing battle for the militia -- monsters can wipe out military bases and reduce the total number of forces available to each player -- but you can get a few good licks in early in the game.&lt;br/&gt;&lt;br/&gt;The battle escalates until the 20th city is reduced to rubble. At that time, all the monsters duke it out in a free-for-all, mano-a-clawo Monster Challenge; and when the dust settles, the last monster standing is declared the winner.&lt;br/&gt;&lt;br/&gt;This game was remade into Monsters Menace America.&lt;br/&gt;&lt;br/&gt;</description>

                            <thumbnail>https://cf.geekdo-images.com/thumb/img/0-jn2SdeXl-FBNOeRHB5YB-pWyY=/fit-in/200x150/filters:strip_icc()/pic265330.jpg</thumbnail>
                <image>https://cf.geekdo-images.com/original/img/MpbbhZNsrhC5T3S55PLSMLxJyUU=/0x0/pic265330.jpg</image>

                                                <boardgamepublisher objectid="5">The Avalon Hill Game Co</boardgamepublisher>
                                                            <boardgameartist objectid="12261">Jean Baer</boardgameartist>
                                                            <boardgamedesigner objectid="56">J. C. Connors (I)</boardgamedesigner>
                                                            <boardgamefamily objectid="5615">Creatures: Monsters</boardgamefamily>
                                                            <boardgameversion objectid="26652">First edition</boardgameversion>
                                                            <boardgamedesigner objectid="181">Ben Knight</boardgamedesigner>
                                                            <boardgameartist objectid="181">Ben Knight</boardgameartist>
                                                            <boardgameartist objectid="20628">Tyson Milbert</boardgameartist>
                                                            <boardgameimplementation objectid="17835">Monsters Menace America</boardgameimplementation>
                                                                                                                    <boardgamecategory objectid="1016">Science Fiction</boardgamecategory>
                                                            <boardgameartist objectid="110380">Dave Shaw (II)</boardgameartist>
                                                            <boardgamesubdomain objectid="5496">Thematic Games</boardgamesubdomain>
                                                            <boardgamefamily objectid="24569">Theme: Kaiju</boardgamefamily>
                                                            <boardgamecategory objectid="1019">Wargame</boardgamecategory>

            <poll name="suggested_numplayers" title="User Suggested Number of Players" totalvotes="7">

        <results numplayers="1">        
                    <result value="Best" numvotes="0" />
                    <result value="Recommended" numvotes="1" />
                    <result value="Not Recommended" numvotes="4" />
                </results>                  

        <results numplayers="2">        
                    <result value="Best" numvotes="0" />
                    <result value="Recommended" numvotes="1" />
                    <result value="Not Recommended" numvotes="4" />
                </results>                  

        <results numplayers="3">        
                    <result value="Best" numvotes="0" />
                    <result value="Recommended" numvotes="4" />
                    <result value="Not Recommended" numvotes="2" />
                </results>                  

        <results numplayers="4">        
                    <result value="Best" numvotes="7" />
                    <result value="Recommended" numvotes="0" />
                    <result value="Not Recommended" numvotes="0" />
                </results>                  

        <results numplayers="4+">       
                    <result value="Best" numvotes="0" />
                    <result value="Recommended" numvotes="1" />
                    <result value="Not Recommended" numvotes="3" />
                </results>                  
    </poll>

            <poll name="language_dependence" title="Language Dependence" totalvotes="2">

        <results>       
                    <result level="1" value="No necessary in-game text" numvotes="0" />
                    <result level="2" value="Some necessary text - easily memorized or small crib sheet" numvotes="0" />
                    <result level="3" value="Moderate in-game text - needs crib sheet or paste ups" numvotes="0" />
                    <result level="4" value="Extensive use of text - massive conversion needed to be playable" numvotes="2" />
                    <result level="5" value="Unplayable in another language" numvotes="0" />
                </results>                  
    </poll>

            <poll name="suggested_playerage" title="User Suggested Player Age" totalvotes="2">
            <results>       
                    <result value="2" numvotes="0" />
                    <result value="3" numvotes="0" />
                    <result value="4" numvotes="0" />
                    <result value="5" numvotes="0" />
                    <result value="6" numvotes="0" />
                    <result value="8" numvotes="0" />
                    <result value="10" numvotes="1" />
                    <result value="12" numvotes="0" />
                    <result value="14" numvotes="1" />
                    <result value="16" numvotes="0" />
                    <result value="18" numvotes="0" />
                    <result value="21 and up" numvotes="0" />
                </results>                  
    </poll>

        </boardgame>
    </boardgames>

2      

The entities encoding HTML br tags in the description tag are causing the problem. I would suggest perhaps running a find/replace on your raw XML before passing it to MicroDom for parsing:

    var contents = try String(contentsOf: url)
    contents = contents.replacingOccurrences(of: "&lt;br/&gt;", with: "\n")

2      

Thanks Roosterboy, that was exactly the problem. Can I ask how you knew that was the issue? I've since discovered a whole lot of formatting problems that can cause the description to not return correctly. So I have created this extension for the ones I've discovered so far

extension String {
    func removeXML() -> String {
        return self.replacingOccurrences(of: "&lt;br/&gt;", with: "\n")
            .replacingOccurrences(of: "&amp;mdash;", with: "-")
            .replacingOccurrences(of: "&#10;", with: "\n")
            .replacingOccurrences(of: "&mdash;", with: " - ")
            .replacingOccurrences(of: "&amp;ndash;", with: " - ")
            .replacingOccurrences(of: "&nbsp;", with: " ")
            .replacingOccurrences(of: "&amp;quot;", with: "'")
            .replacingOccurrences(of: "&ldquo;", with: "'")
            .replacingOccurrences(of: "&rdquo", with: "'")
            .replacingOccurrences(of: "&amp;rdquo", with: "'")
            .replacingOccurrences(of: "&rsquo", with: "'")
            .replacingOccurrences(of: "&amp;rsquo", with: "'")
            .replacingOccurrences(of: "&#239;", with: "a")
            .replacingOccurrences(of: "&#130;", with: "f")
            .replacingOccurrences(of: "&#128;", with: "€")
            .replacingOccurrences(of: "&#172", with: "u")
            .replacingOccurrences(of: "&ugrave;", with: "u")
            .replacingOccurrences(of: "&#197;", with: "o")
            .replacingOccurrences(of: "&#141;", with: "n")
            .replacingOccurrences(of: "&ntilde;", with: "n")
            .replacingOccurrences(of: "à", with: "a")
            .replacingOccurrences(of: "&amp;auml;", with: "a")
            .replacingOccurrences(of: "&amp;ouml", with: "o")
            .replacingOccurrences(of: "º", with: "#")
            .replacingOccurrences(of: "&amp;amp;", with: "and")
            .replacingOccurrences(of: "–", with: "-")
    }
}

Can't say I'm very happy with this solution though. Is there a better way?

2      

AFAIK, there really isn't a better solution if you want to stick with XMLParser. You would think that parser(_:resolveExternalEntityName:systemID:) would be useful here, maybe with something like this:

    private let entities: [String:String] = [
        "rsquo": "'",
        "mdash": "—",
        //etc...
    ]

    func parser(_ parser: XMLParser, resolveExternalEntityName name: String, systemID: String?) -> Data? {
        return entities[name, default: ""].data(using: .utf8)
    }

but it's not. No matter what I try, XMLParser just stops parsing with an NSXMLParserErrorDomain error 26 after calling this method with the first encountered entity and the node tree comes back nil. From what I can tell through some searches online, this is a known and long-standing issue with (NS)XMLParser.

So, unless I've missed something, your options boil down to:

  1. Clean up the XML at the source,
  2. Clean up the XML before you parse it, or
  3. Use a different parser

Option 1 probably is not possible since you are consuming someone else's data. Option 2 could get really expensive, since doing so many replaces on the string is likely to not be very performant. Option 3 could be doable, as there are a number of 3rd party XML parsers available on github. You'd have to research them to find one that best suits your needs.

An addendum to Option 2: You could probably speed up performance by not using repeated calls to replacingOccurrences(of:with:) but instead parsing the string in a loop and replacing the entities as you go. That would at least mean you only looped through the string once instead of once for each replacement. Something like this gist, in other words.

2      

Aha, I may have hit on a solution after waking from a relaxing nap. But I had to run a quick errand so I haven't finished testing it yet.

It involves prefixing a doctype and entity declarations to the XML and then using the delegate method parser(_:foundInternalEntityDeclarationWithName:value:) to replace them.

Give me some time and I will finish my tests and report back.

Ugh, never mind. That didn't work.

2      

thanks for your help Roosterboy

2      

Hacking with Swift is sponsored by RevenueCat

SPONSORED Take the pain out of configuring and testing your paywalls. RevenueCat's Paywalls allow you to remotely configure your entire paywall view without any code changes or app updates.

Learn more here

Sponsor Hacking with Swift and reach the world's largest Swift community!

Archived topic

This topic has been closed due to inactivity, so you can't reply. Please create a new topic if you need to.

All interactions here are governed by our code of conduct.

 
Unknown user

You are not logged in

Log in or create account
 

Link copied to your pasteboard.