Uploaded image for project: 'Tika'
  1. Tika
  2. TIKA-190

wrong handling of ignorableWhitespace/characters in SafeContentHandler and WriteoutContentHandler

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 0.3
    • 0.3
    • None
    • None

    Description

      During investigation of TIKA-189, I found out the following:
      The patch TIKA-188 does everything correct (if looking at the output), but the internal handling is incorrect. XHTMLContentHandler inserts ignorableWhitespace with the tabs and newlines, but the superclass SafeContentHandler has a bug that forwards ignorableWhitespace() to the decorators characters() event (copy'n'paste-error). Fixing this, the tests fail, because WriteoutContentHandler has no ignorableWhitespace() and removes all whitespace.

      Attachments

        1. TIKA-190.patch
          1 kB
          Uwe Schindler

        Activity

          People

            jukkaz Jukka Zitting
            uschindler Uwe Schindler
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: