I wrote a test application using org.apache.pdfbox.ExtractImages to... extract images as PNG. (This is the start of something bigger, which involves making a statistic about the content of over a million pages within PDF files) However all images I get are all black or all white when I test on our own PDF files. I did get correct images from a file that had color images. To extract, I tried page.convertToImage() and then writing with ImageIO.write(), but I also tried using PDFImageWriter, neither had success for b/w images.
The sample PDF is not confidential; it does give a warning "getRGBImage returned NULL" but other PDFs that don't give the warning (but are confidential) also fail.