Uploaded image for project: 'Tika'
  1. Tika
  2. TIKA-2019

WordMLParser and SpreadsheetMLParser incorrectly concatenate tokens with ToTextHandler

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 1.14, 2.0.0
    • None
    • None

    Description

      The xml generated by these parsers was good, but when using the ToTextHandler, spaces/tabs were not added correctly. This leads to incorrectly concatenated strings. Further, because we are extending the XMLParser, while the metadata is extracted, it isn't well represented the xml.

      Attachments

        Activity

          People

            Unassigned Unassigned
            tallison Tim Allison
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: