Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
1.16
-
None
-
None
Description
upon downloading the latest tika and trying basic commands it spews unwanted warnings, which makes parsing output necessary.
Example 1:
java -jar tika-app-1.16.jar --list-detectors Dec 05, 2017 3:16:13 PM org.apache.tika.config.InitializableProblemHandler$3 handleInitializableProblem WARNING: JBIG2ImageReader not loaded. jbig2 files will be ignored See https://pdfbox.apache.org/2.0/dependencies.html#jai-image-io for optional dependencies. TIFFImageWriter not loaded. tiff files will not be processed See https://pdfbox.apache.org/2.0/dependencies.html#jai-image-io for optional dependencies. J2KImageReader not loaded. JPEG2000 files will not be processed. See https://pdfbox.apache.org/2.0/dependencies.html#jai-image-io for optional dependencies. Dec 05, 2017 3:16:13 PM org.apache.tika.config.InitializableProblemHandler$3 handleInitializableProblem WARNING: org.xerial's sqlite-jdbc is not loaded. Please provide the jar on your classpath to parse sqlite files. See tika-parsers/pom.xml for the correct version. org.apache.tika.detect.DefaultDetector (Composite Detector): org.apache.tika.parser.microsoft.POIFSContainerDetector org.apache.tika.parser.pkg.ZipContainerDetector org.gagravarr.tika.OggDetector org.apache.tika.mime.MimeTypes
Example 2:
java -jar tika-app-1.16.jar --text my.xlsx Dec 05, 2017 3:00:22 PM org.apache.tika.config.InitializableProblemHandler$3 handleInitializableProblem WARNING: JBIG2ImageReader not loaded. jbig2 files will be ignored See https://pdfbox.apache.org/2.0/dependencies.html#jai-image-io for optional dependencies. TIFFImageWriter not loaded. tiff files will not be processed See https://pdfbox.apache.org/2.0/dependencies.html#jai-image-io for optional dependencies. J2KImageReader not loaded. JPEG2000 files will not be processed. See https://pdfbox.apache.org/2.0/dependencies.html#jai-image-io for optional dependencies. Dec 05, 2017 3:00:22 PM org.apache.tika.config.InitializableProblemHandler$3 handleInitializableProblem WARNING: org.xerial's sqlite-jdbc is not loaded. Please provide the jar on your classpath to parse sqlite files. See tika-parsers/pom.xml for the correct version. INFO As a convenience, TikaCLI has turned on extraction of inline images for the PDFParser (TIKA-2374). This is not the default option in Tika generally or in tika-server. As a convenience, TikaCLI has turned on extraction of inline images for the PDFParser (TIKA-2374). This is not the default option in Tika generally or in tika-server.
The expected behavior is to return only the requested information. I do not see a switch to turn off or control unrequested warnings.
I can't imagine this is the correct behavior. It is not documented, nor could I find why such output exists.