[PDFBOX-1072] PDFImageWriter extracts black images from arabic PDFs - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: 1.6.0
Fix Version/s: 1.8.0
Component/s: Utilities
Labels:
- JBIG2

Description

When I tried to extract a JPEG image from arabic PDF, i've got a corrupted file with black area which overlays all arabic text on each page.
In console i've got only this debug message and no other exceptions and so on:
DEBUG (PDPixelMap.java:241) - ColorModel: IndexColorModel: #pixelBits = 1 numComponents = 4 color space = java.awt.color.ICC_ColorSpace@2eeb3c84 transparency = 2 transIndex = 1 has alpha = true isAlphaPre = false
This is not only one pdf file. I have about 400-500 files which produces the same thing.

Code:
PDFImageWriter writer = new PDFImageWriter();
PDDocument document = PDDocument.load(sourceFile);
writer.writeImage(document, "jpg", "", 1, 1, filename);

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

page9_thumbnail.png
19/Jul/11 13:01
2 kB
Anton Stremoukhov

Issue Links

relates to

PDFBOX-1067 PDF Scan from Xerox WorkCentre 5030 renders as all black

Closed

Activity

People

Assignee:: Andreas Lehmkühler

Reporter:: Anton Stremoukhov

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 19/Jul/11 12:59

Updated:: 23/Mar/13 12:56

Resolved:: 18/Nov/12 14:46