[PDFBOX-4530] PDFRenderer adding horizental white lines to exported image - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Major
Resolution: Not A Bug
Affects Version/s: 2.0.15
Fix Version/s: None
Component/s: Rendering
Labels:
None

Description

Hello,

I started using pdfbox recently to extract a datamatrix code from a pdf file.

The image extraction works pretty fine.

We found out that the source of the pdfs is not attaching them neither as embedded objects or inline image, the datamatrix is coded in the pdf as black squares.

Then, the idea was to convert the pdf to an image and parse the code.

Only problem, the conversion sometimes add white lines inside the datamatrix which makes the it unparsable (see attachements page-3-1.jpeg and page-3.pdf)

For some other cases, the datamatrix squares differ in size in the exported image while they are the same in the original pdf file (see attachements page-7.jpeg and page-7-1.pdf).

The outcome is the same and the parser is not able to recognize the datamatrix content.

The code I am using to convert to BufferedImage is pretty straightforward

                BufferedImage bi = new PDFRenderer(pdDocument).renderImageWithDPI(i, 600, ImageType.BINARY);

Is it the way I am using the renderer which causing this problem or simply a bug in the software!

I am attaching the test project reproducing the behavior.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

page-3.pdf
29/Apr/19 09:06
132 kB
amiladi
page-7.pdf
29/Apr/19 09:06
150 kB
amiladi
page-7-1.jpeg
29/Apr/19 09:06
389 kB
amiladi
page-3-1.jpeg
29/Apr/19 09:06
338 kB
amiladi
PdfBoxTestCase.zip
29/Apr/19 09:08
717 kB
amiladi

Issue Links

relates to

PDFBOX-4435 Poor quality printing of PDF label

Closed

Activity

People

Assignee:: Unassigned

Reporter:: amiladi

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 29/Apr/19 09:11

Updated:: 30/Apr/19 17:16

Resolved:: 30/Apr/19 17:16