Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
-
None
-
None
-
Mac OS 10.13.3 (17D47)
17:42 ext$ java -version
java version "9.0.1"
Java(TM) SE Runtime Environment (build 9.0.1+11)
Java HotSpot(TM) 64-Bit Server VM (build 9.0.1+11, mixed mode)
17:42 ext$ uname -a
Darwin bix.local 17.4.0 Darwin Kernel Version 17.4.0: Sun Dec 17 09:19:54 PST 2017; root:xnu-4570.41.2~1/RELEASE_X86_64 x86_64
Mac OS 10.13.3 (17D47) 17:42 ext$ java -version java version "9.0.1" Java(TM) SE Runtime Environment (build 9.0.1+11) Java HotSpot(TM) 64-Bit Server VM (build 9.0.1+11, mixed mode) 17:42 ext$ uname -a Darwin bix.local 17.4.0 Darwin Kernel Version 17.4.0: Sun Dec 17 09:19:54 PST 2017; root:xnu-4570.41.2~1/RELEASE_X86_64 x86_64
Description
This may be related to TIKA-2395. When trying to extract the files from
tika/tika-parsers/src/test/resources/test-documents/test-documents.tgz
% coursier launch org.apache.tika:tika-app:1.17 --main org.apache.tika.cli.TikaCLI – --extract test-documents.tgz
I see the exception:
Exception in thread "main" org.apache.tika.exception.TikaException: TIKA-198: Illegal IOException from org.apache.tika.parser.pkg.CompressorParser@62628e78
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:286)
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143)
at org.apache.tika.cli.TikaCLI$OutputType.process(TikaCLI.java:205)
at org.apache.tika.cli.TikaCLI.process(TikaCLI.java:486)
at org.apache.tika.cli.TikaCLI.main(TikaCLI.java:145)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:564)
at coursier.cli.qR.a(Unknown Source)
at coursier.cli.qQ.j(Unknown Source)
at coursier.cli.qW.a(Unknown Source)
at d.h.a.c(Unknown Source)
at b.b.c_(Unknown Source)
at d.b.d.E.g(Unknown Source)
at d.b.e.aW.g(Unknown Source)
at d.b.f.b.aa.a(Unknown Source)
at coursier.cli.qQ.b(Unknown Source)
at coursier.cli.Q.b(Unknown Source)
at b.J.c_(Unknown Source)
at d.F.h(Unknown Source)
at b.F.a(Unknown Source)
at coursier.cli.Coursier.main(Unknown Source)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:564)
at coursier.Bootstrap.main(Bootstrap.java:428)
Caused by: java.io.IOException: mark/reset not supported
at java.base/java.io.InputStream.reset(InputStream.java:474)
at org.apache.tika.parser.microsoft.POIFSContainerDetector.detect(POIFSContainerDetector.java:444)
at org.apache.tika.detect.CompositeDetector.detect(CompositeDetector.java:84)
at org.apache.tika.cli.TikaCLI$FileEmbeddedDocumentExtractor.parseEmbedded(TikaCLI.java:1045)
at org.apache.tika.parser.pkg.CompressorParser.parse(CompressorParser.java:222)
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
... 28 more
However, I can browse the document fine using:
% coursier launch org.apache.tika:tika-app:1.17 --main org.apache.tika.cli.TikaCLI – test-documents.tgz
This issue affects: test-documents.rar, test-documents.tar.Z, test-documents.tbz2, and test-documents.tgz
But it does not affect test-documents.7z, test-documents.cab, test-documents.ddf, test-documents.dmg, test-documents.tar, or test-documents.zip
This makes me suspect that it has something to do with extracting files from packages that are embedded in other archive parsers.