Description
We are currently using tika for text extraction. Currently some sites are returning zips that have entries with stored data descriptors which fail to extract due to the ZipArchiveInputStream (in commons-compress) defaulting to false for 'allowStoredEntriesWithDataDescriptor'.
Since ZipArchiveInputStream has support for reading zips with data descriptors we should attempt to read the zip with that feature enabled when we get a data descriptor UnsupportedZipFeatureException.
Pull Request: https://github.com/apache/tika/pull/356
Attachments
Attachments
Issue Links
- relates to
-
TIKA-3316 Illegal IOException processing XPS files
- Resolved