[TIKA-3071] tika-server's unpacker should pass the parent parser into the parsecontext to be used for inline parsing - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Task
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 1.25
Component/s: None
Labels:
None

Description

A small handful of our parsers call other parsers "inline" to handle portions of a document. By, inline, I mean that they aren't calling the other parsers on attachments or embedded objects; rather, they are calling the parsers on material that is meant to be understood as part of the main document, e.g. TesseractOCRParser on a rendered image of a PDF page.

On TIKA-3069, carina.antunes pointed out that there is a bit of a disconnect in returning the container file's text in the /unpack/all endpoint with images in PDFs.

Attachments

Activity

People

Assignee:: Tim Allison

Reporter:: Tim Allison

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 12/Mar/20 15:39

Updated:: 12/Mar/20 17:26

Resolved:: 12/Mar/20 15:57