Uploaded image for project: 'Tika'
  1. Tika
  2. TIKA-3196

PackageParser should attempt to parse entries from zip files with STORED entries with data descriptor

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 2.0.0, 1.25
    • Component/s: parser
    • Labels:
      None

      Description

      We are currently using tika for text extraction. Currently some sites are returning zips that have entries with stored data descriptors which fail to extract due to the ZipArchiveInputStream (in commons-compress) defaulting to false for 'allowStoredEntriesWithDataDescriptor'.

      Since ZipArchiveInputStream has support for reading zips with data descriptors we should attempt to read the zip with that feature enabled when we get a data descriptor UnsupportedZipFeatureException.

      Pull Request: https://github.com/apache/tika/pull/356

        Attachments

        1. OOO-107047-0.oxt-145.zip
          2 kB
          Tim Allison

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              tbentley Trevor Bentley
            • Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: