[PDFBOX-4296] Question: Performance - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Closed
Priority: Trivial
Resolution: Incomplete
Affects Version/s: 2.0.11
Fix Version/s: None
Component/s: Rendering
Labels:
- optimization
- performance

Description

Hi Team.

We use a tool we built using PDFBox to extract text for about 10k pages per day. Then we have another tool to extract images using Poppler.

We want to use PDFBox for both tasks but sadly we see a performance hit using PDFBox in the order of 3 times.

Do you have any backlog / technical dept / ideas on how to improve performance?

We have tried -Dorg.apache.pdfbox.rendering.UsePureJavaCMYKConversion=true and that made image generation much slower.
We have set System.setProperty("sun.java2d.cmm", "sun.java2d.cmm.kcms.KcmsServiceProvider") in code.

We use image libraries from twelvemonkeys, pdfbox and the standard jai project.

I've read in the code that we do double writes for images using transparency which might be a culprit.

I have been allowed to put some time into the project if we have some solid leads or a roadmap to reach better performance.

Hope it's okay to track this issue here instead of a question on the mailing list.

Best regards

Daniel

Attachments

Activity

People

Assignee:: Unassigned

Reporter:: Daniel Persson

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 21/Aug/18 09:33

Updated:: 24/Feb/19 09:56

Resolved:: 29/Aug/18 06:59