Uploaded image for project: 'PDFBox'
  1. PDFBox
  2. PDFBOX-955

Can't extract b/w images from PDF



    • Type: Improvement
    • Status: Closed
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: 1.4.0
    • Fix Version/s: 1.6.0
    • Component/s: None
    • Labels:
    • Environment:
      Windows XP prof, Java 1.6.0_22, Netbeans 6.9.1


      I wrote a test application using org.apache.pdfbox.ExtractImages to... extract images as PNG. (This is the start of something bigger, which involves making a statistic about the content of over a million pages within PDF files) However all images I get are all black or all white when I test on our own PDF files. I did get correct images from a file that had color images. To extract, I tried page.convertToImage() and then writing with ImageIO.write(), but I also tried using PDFImageWriter, neither had success for b/w images.

      The sample PDF is not confidential; it does give a warning "getRGBImage returned NULL" but other PDFs that don't give the warning (but are confidential) also fail.


        1. d0000040-01.png
          9 kB
          Tilman Hausherr
        2. d0000040.pdf
          8 kB
          Tilman Hausherr
        3. ExtractImages.java
          4 kB
          Tilman Hausherr
        4. photo.pdf
          333 kB
          Roel Pieters
        5. photo.jpg
          14 kB
          Roel Pieters
        6. PDFBOX955-photo1.png
          337 kB
          Andreas Lehmkühler
        7. PDFBOX955-d00000401.png
          44 kB
          Andreas Lehmkühler
        8. ccitt4-cib-test.pdf
          10 kB
          Tilman Hausherr
        9. ccitt4-cib-test-01.png
          20 kB
          Tilman Hausherr

          Issue Links



              • Assignee:
                lehmi Andreas Lehmkühler
                tilman Tilman Hausherr
              • Votes:
                0 Vote for this issue
                2 Start watching this issue


                • Created: