Uploaded image for project: 'PDFBox'
  1. PDFBox
  2. PDFBOX-955

Can't extract b/w images from PDF

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Minor
    • Resolution: Fixed
    • 1.4.0
    • 1.6.0
    • None
    • Windows XP prof, Java 1.6.0_22, Netbeans 6.9.1

    Description

      I wrote a test application using org.apache.pdfbox.ExtractImages to... extract images as PNG. (This is the start of something bigger, which involves making a statistic about the content of over a million pages within PDF files) However all images I get are all black or all white when I test on our own PDF files. I did get correct images from a file that had color images. To extract, I tried page.convertToImage() and then writing with ImageIO.write(), but I also tried using PDFImageWriter, neither had success for b/w images.

      The sample PDF is not confidential; it does give a warning "getRGBImage returned NULL" but other PDFs that don't give the warning (but are confidential) also fail.

      Attachments

        1. photo.pdf
          333 kB
          Roel Pieters
        2. photo.jpg
          14 kB
          Roel Pieters
        3. PDFBOX955-photo1.png
          337 kB
          Andreas Lehmkühler
        4. PDFBOX955-d00000401.png
          44 kB
          Andreas Lehmkühler
        5. ExtractImages.java
          4 kB
          Tilman Hausherr
        6. d0000040-01.png
          9 kB
          Tilman Hausherr
        7. d0000040.pdf
          8 kB
          Tilman Hausherr
        8. ccitt4-cib-test-01.png
          20 kB
          Tilman Hausherr
        9. ccitt4-cib-test.pdf
          10 kB
          Tilman Hausherr

        Issue Links

          Activity

            People

              lehmi Andreas Lehmkühler
              tilman Tilman Hausherr
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: