Uploaded image for project: 'Tika'
  1. Tika
  2. TIKA-128

HTML parser should produce XHTML SAX events

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 0.2
    • parser
    • None

    Description

      The current HTML parser just sanitizes the input HTML and passes it forward with no structural changes.

      Unfortunately this is incompatible with the other Tika parsers that produce XHTML output, and so IMHO we should be outputting XHTML also from the HTML parser.

      Attachments

        Activity

          People

            jukkaz Jukka Zitting
            jukkaz Jukka Zitting
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: