Take the pdf from
PDFBOX-1708, put a breakpoint into the class CCITTFaxFilter, method decode() and run PDFToImage. You will see the debugger stop twice, even if the pdf contains a single image.
The second call is arrives when the image is rendered to G2D, this is OK. But for the first time, the image is decompressed in the constructor of PDImageXObject - line 147
just to allow the filter (CCITTFaxFilter in this case) to provide additional dictionary parameters in case something is missing in the input (COLORSPACE would be set to DeviceGray if missing here).
I think this is a complete waste. The filter should be able to fix the dictionary without having to decode the image. As far as I can tell, this could be done by implementing a repair method on COSStream and on implementations of Filter.
Also, I do not see that the stream created in the above mentioned constructor of PDImageXObject would ever be closed. This seems to be a more general issue. I have put a counter into COSInputStream.create(), there where it creates new RandomAccessInputStream(buffer). With the testfile from
PDFBOX-1708, I end up with 3 unclosed streams when the program finishes. I am not sure whether this is important, but I guess the unclosed streams are uselessly occupying space in the scratch file.
Sorry if this is just lack of understanding of the code from my side, but I could not resist to report what I see.