Details
Description
I wrote a test application using org.apache.pdfbox.ExtractImages to... extract images as PNG. (This is the start of something bigger, which involves making a statistic about the content of over a million pages within PDF files) However all images I get are all black or all white when I test on our own PDF files. I did get correct images from a file that had color images. To extract, I tried page.convertToImage() and then writing with ImageIO.write(), but I also tried using PDFImageWriter, neither had success for b/w images.
The sample PDF is not confidential; it does give a warning "getRGBImage returned NULL" but other PDFs that don't give the warning (but are confidential) also fail.
Attachments
Attachments
Issue Links
- is duplicated by
-
PDFBOX-1018 Remove imageIO dependency (was: PDPage convertToImage bug creates white images from black and white pdf files.)
- Closed
-
PDFBOX-794 PDPage convertToImage generates white image with no contents
- Closed