Uploaded image for project: 'PDFBox'
  1. PDFBox
  2. PDFBOX-1072

PDFImageWriter extracts black images from arabic PDFs

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 1.6.0
    • Fix Version/s: 1.8.0
    • Component/s: Utilities
    • Labels:

      Description

      When I tried to extract a JPEG image from arabic PDF, i've got a corrupted file with black area which overlays all arabic text on each page.
      In console i've got only this debug message and no other exceptions and so on:
      DEBUG (PDPixelMap.java:241) - ColorModel: IndexColorModel: #pixelBits = 1 numComponents = 4 color space = java.awt.color.ICC_ColorSpace@2eeb3c84 transparency = 2 transIndex = 1 has alpha = true isAlphaPre = false
      This is not only one pdf file. I have about 400-500 files which produces the same thing.

      Code:
      PDFImageWriter writer = new PDFImageWriter();
      PDDocument document = PDDocument.load(sourceFile);
      writer.writeImage(document, "jpg", "", 1, 1, filename);

        Attachments

        1. page9_thumbnail.png
          2 kB
          Anton Stremoukhov

          Issue Links

            Activity

              People

              • Assignee:
                lehmi Andreas Lehmkühler
                Reporter:
                delson Anton Stremoukhov
              • Votes:
                0 Vote for this issue
                Watchers:
                1 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: