[TIKA-2303] PDFParser with optional bookmarks text extraction - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 1.14
Fix Version/s: None
Component/s: parser
Labels:
- option
- parser
- pdf

Description

I would like to parse an PDF without extract its bookmarks and outlines.

I was thinking about create a new PDFParser parameter in PDFParserConfig with a option such as 'ExtractBookmarks'. And check it out on 'AbstractPDF2XHTML'

I can do it, and I would like to present you a patch with this change.

Thanks in advance.

Attachments

Issue Links

links to

GitHub Pull Request #157

Activity

People

Assignee:: Unassigned

Reporter:: Pablo Palazon

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 16/Mar/17 15:33

Updated:: 04/Jun/18 12:09

Resolved:: 04/Jun/18 12:09

Time Tracking

Estimated:

10m

Remaining:

10m

Logged:

Not Specified