Uploaded image for project: 'PDFBox'
  1. PDFBox
  2. PDFBOX-1072

PDFImageWriter extracts black images from arabic PDFs

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 1.6.0
    • 1.8.0
    • Utilities

    Description

      When I tried to extract a JPEG image from arabic PDF, i've got a corrupted file with black area which overlays all arabic text on each page.
      In console i've got only this debug message and no other exceptions and so on:
      DEBUG (PDPixelMap.java:241) - ColorModel: IndexColorModel: #pixelBits = 1 numComponents = 4 color space = java.awt.color.ICC_ColorSpace@2eeb3c84 transparency = 2 transIndex = 1 has alpha = true isAlphaPre = false
      This is not only one pdf file. I have about 400-500 files which produces the same thing.

      Code:
      PDFImageWriter writer = new PDFImageWriter();
      PDDocument document = PDDocument.load(sourceFile);
      writer.writeImage(document, "jpg", "", 1, 1, filename);

      Attachments

        1. page9_thumbnail.png
          2 kB
          Anton Stremoukhov

        Issue Links

          Activity

            People

              lehmi Andreas Lehmkühler
              delson Anton Stremoukhov
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: