Uploaded image for project: 'Tika'
  1. Tika
  2. TIKA-1028

Tika-server quits parsing of rfc-822 document prematurely when it encounters encrypted zip file as attachment.

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 1.2, 1.3, 1.4, 1.5, 1.6, 1.7
    • 1.8
    • mime, parser, server
    • None

    Description

      The Zip parser in tika-server does not allow passing in the password for decrypting the zip file and doesn't handle the unsupported feature gracefully. Problem happens when zip file is attached part of email document being parsed, and the parser gives up and throws an exception:

      WARNING: all: Unpacker failed
      org.apache.tika.exception.TikaException: TIKA-198: Illegal IOException from org.apache.tika.parser.pkg.PackageParser@10fcc945

      Caused by: org.apache.commons.compress.archivers.zip.UnsupportedZipFeatureException: unsupported feature encryption used in entry

      Instead of returning the successfully parsed components, Tika-server returns nothing.

      It would be better to return rest of the parsed document contents along with the untouched offending zip file in the archive that Tika-server returns as a result. Until the feature of zip file decrypting is added this would always return untouched zip file, and after it is implemented it should return the untouched zip file in the cases where wrong password was provided.

      Attachments

        1. test.eml
          3 kB
          Juha Haaga
        2. Document.zip
          0.2 kB
          Luís Filipe Nassif

        Activity

          People

            Unassigned Unassigned
            fuu Juha Haaga
            Votes:
            1 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: