Uploaded image for project: 'Tika'
  1. Tika
  2. TIKA-3788

Allow embedded exceptions and warnings to percolate to the parent's metadata

Attach filesAttach ScreenshotVotersStop watchingWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 2.4.1
    • None
    • None

    Description

      As part of work on TIKA-3787, I'll add a ParseRecord to the ParseContext. This can be used by parsers that parse embedded files to record caught exceptions and warning messages. The CompositeParser keeps track of depth of its parse and when the depth returns to 0, it will write these exceptions and warnings to the Metadata object.

      I would still highly recommend /rmeta, -J, the RecursiveParserWrapper, but this new capability adds some functionality to the standard /tika (with json output), and programmatically to the AutoDetectParser.

      Because this information is added to the metadata object after the parse, it will not come through in streaming contexts where the metadata object is written to the xhtml before the content of the file is parsed. So, this will not add any benefit to /tika (text/html).

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            Unassigned Unassigned
            tallison Tim Allison
            Votes:
            0 Vote for this issue
            Watchers:
            2 Stop watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment