Uploaded image for project: 'Corinthia'
  1. Corinthia
  2. COR-20

Write an XML/HTML parser

    XMLWordPrintableJSON

Details

    Description

      Currently we rely on libxml2 and HTML Tidy for parsing XML and HTML, respectively. In both cases we are only using the parsing functions of libraries, not other features like the DOM tree or other things.

      Parsing XML is not very difficult to do. HTML slightly more, because of all the ambiguities that arise from the poorly-defined parsing rules in earlier versions of the spec ("make a best effort" became "replicate what internet explorer does" because almost every site violated the rules). However the HTML5 spec now defines a proper parsing algorithm that deals with said ambiguities. We'll need to also take into account the details of which tags must have a corresponding close dag and which tags do not require this.

      Having our own parser will simplify dependencies a lot, particularly with the somewhat awkward HTML tidy.

      Attachments

        Activity

          People

            pmkelly Peter Kelly
            pmkelly Peter Kelly
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: