Uploaded image for project: 'Tika'
  1. Tika
  2. TIKA-3196

PackageParser should attempt to parse entries from zip files with STORED entries with data descriptor

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 2.0.0, 1.27
    • parser
    • None

    Description

      We are currently using tika for text extraction. Currently some sites are returning zips that have entries with stored data descriptors which fail to extract due to the ZipArchiveInputStream (in commons-compress) defaulting to false for 'allowStoredEntriesWithDataDescriptor'.

      Since ZipArchiveInputStream has support for reading zips with data descriptors we should attempt to read the zip with that feature enabled when we get a data descriptor UnsupportedZipFeatureException.

      Pull Request: https://github.com/apache/tika/pull/356

      Attachments

        1. OOO-107047-0.oxt-145.zip
          2 kB
          Tim Allison

        Issue Links

          Activity

            People

              Unassigned Unassigned
              tbentley Trevor Bentley
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: