Uploaded image for project: 'PDFBox'
  1. PDFBox
  2. PDFBOX-955

Can't extract b/w images from PDF

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Minor
    • Resolution: Fixed
    • 1.4.0
    • 1.6.0
    • None
    • Windows XP prof, Java 1.6.0_22, Netbeans 6.9.1

    Description

      I wrote a test application using org.apache.pdfbox.ExtractImages to... extract images as PNG. (This is the start of something bigger, which involves making a statistic about the content of over a million pages within PDF files) However all images I get are all black or all white when I test on our own PDF files. I did get correct images from a file that had color images. To extract, I tried page.convertToImage() and then writing with ImageIO.write(), but I also tried using PDFImageWriter, neither had success for b/w images.

      The sample PDF is not confidential; it does give a warning "getRGBImage returned NULL" but other PDFs that don't give the warning (but are confidential) also fail.

      Attachments

        1. d0000040-01.png
          9 kB
          Tilman Hausherr
        2. d0000040.pdf
          8 kB
          Tilman Hausherr
        3. ExtractImages.java
          4 kB
          Tilman Hausherr
        4. photo.pdf
          333 kB
          Roel Pieters
        5. photo.jpg
          14 kB
          Roel Pieters
        6. PDFBOX955-photo1.png
          337 kB
          Andreas Lehmkühler
        7. PDFBOX955-d00000401.png
          44 kB
          Andreas Lehmkühler
        8. ccitt4-cib-test.pdf
          10 kB
          Tilman Hausherr
        9. ccitt4-cib-test-01.png
          20 kB
          Tilman Hausherr

        Issue Links

          Activity

            People

              lehmi Andreas Lehmkühler
              tilman Tilman Hausherr
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: