[TIKA-2533] Improve embedded image extraction in PDFs - ASF JIRA

Attach files

Attach Screenshot

Add vote

Voters

Watch issue

Watchers

Create sub-task

Link

Clone

Update Comment Author

Replace String in Comment

Update Comment Visibility

Delete Comments

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Open
Priority: Minor
Resolution: Unresolved
Affects Version/s: None
Fix Version/s: None
Component/s: None
Labels:
None

Description

~~PDFBOX-4043~~, Tilman Hausherr pinged us to fix a parallel bug in our extraction of images. Given that we're copying/pasting from PDFBox's ExtractImages, we should fix that bug and consider refactoring our PDFParser a bit to make it easier to copy/paste from ExtractImages.