Uploaded image for project: 'Tika'
  1. Tika
  2. TIKA-128

HTML parser should produce XHTML SAX events

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.2
    • Component/s: parser
    • Labels:
      None

      Description

      The current HTML parser just sanitizes the input HTML and passes it forward with no structural changes.

      Unfortunately this is incompatible with the other Tika parsers that produce XHTML output, and so IMHO we should be outputting XHTML also from the HTML parser.

        Attachments

          Activity

            People

            • Assignee:
              jukkaz Jukka Zitting
              Reporter:
              jukkaz Jukka Zitting
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: