Uploaded image for project: 'PDFBox'
  1. PDFBox
  2. PDFBOX-955

Can't extract b/w images from PDF



    • Type: Improvement
    • Status: Closed
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: 1.4.0
    • Fix Version/s: 1.6.0
    • Component/s: None
    • Labels:
    • Environment:
      Windows XP prof, Java 1.6.0_22, Netbeans 6.9.1


      I wrote a test application using org.apache.pdfbox.ExtractImages to... extract images as PNG. (This is the start of something bigger, which involves making a statistic about the content of over a million pages within PDF files) However all images I get are all black or all white when I test on our own PDF files. I did get correct images from a file that had color images. To extract, I tried page.convertToImage() and then writing with ImageIO.write(), but I also tried using PDFImageWriter, neither had success for b/w images.

      The sample PDF is not confidential; it does give a warning "getRGBImage returned NULL" but other PDFs that don't give the warning (but are confidential) also fail.


        1. photo.pdf
          333 kB
          Roel Pieters
        2. photo.jpg
          14 kB
          Roel Pieters
        3. PDFBOX955-photo1.png
          337 kB
          Andreas Lehmkühler
        4. PDFBOX955-d00000401.png
          44 kB
          Andreas Lehmkühler
        5. ExtractImages.java
          4 kB
          Tilman Hausherr
        6. d0000040-01.png
          9 kB
          Tilman Hausherr
        7. d0000040.pdf
          8 kB
          Tilman Hausherr
        8. ccitt4-cib-test-01.png
          20 kB
          Tilman Hausherr
        9. ccitt4-cib-test.pdf
          10 kB
          Tilman Hausherr

          Issue Links



              • Assignee:
                lehmi Andreas Lehmkühler
                tilman Tilman Hausherr
              • Votes:
                0 Vote for this issue
                2 Start watching this issue


                • Created: