Details
-
Wish
-
Status: Open
-
Major
-
Resolution: Unresolved
-
2.0.17
-
None
-
None
-
windows documented above but likely cross platform
Description
- PDFs with jpeg2000 fail to extract, even if dependencies documented in https://pdfbox.apache.org/2.0/dependencies.html are satisfied. There appears to be an additional dependency on https://github.com/jai-imageio/jai-imageio-jpeg2000, without it get error:
Nov 09, 2019 11:04:07 AM org.apache.pdfbox.contentstream.PDFStreamEngine operatorException SEVERE: Cannot read JPEG2000 image: Java Advanced Imaging (JAI) Image I/O Tools are not installed
Additionally for a novice user who is using the command line tool, it is not clear how to the use the sample CLI application once the dependencies are satisfied. For example, simply adding the three (3) jar files to the class path is not sufficient, additionally the main jar file needs to be executed without the -jar parameter and the entry classname specified (alternatively the jar file contents need to be merged).
I found that the following was needed:
- jai-imageio-core-1.4.0.jar
- jai-imageio-jpeg2000-1.3.0.jar
- pdfbox-app-2.0.17.jar
and then call via:
REM Windows classpath separators used java -cp jai-imageio-jpeg2000-1.3.0.jar;jai-imageio-core-1.4.0.jar;pdfbox-app-2.0.17.jar org.apache.pdfbox.tools.PDFBox ExtractImages TEST.pdf
Its possible there is a code solution to this but quick fix is to:
- document the jpeg2000 dependency as well as the (already) documented JAI Image I/O need
- document how to then use these with the sample pdfbox app with an example