Uploaded image for project: 'Tika'
  1. Tika
  2. TIKA-1420

Add Metadata Extraction to Arbitrary Parsers

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.7
    • Component/s: parser
    • Labels:

      Description

      Suppose you wish to extract information from arbitrary file types and add it to a Metadata Object. This type of task is best handled by a... Handler. But, Handlers do not have access to the Metadata Object passed to a Parser.

      So, I see a few ways we could do using existing functionality.

      1) Make an intermediate XML representation of the desired metadata in a handler, then convert the XML to the Metadata after parsing.

      2) Create a second Parser which extracts the desired information.
      a) Assume the Handler passed to this Parser is already filled with content. So, we could simply get whatever content from the Handler and populate the Metadata directly.
      b) Create a new Stream in the first Parser to pass to the second, which in turn populates the Metadata.

      None of these options seem ideal. Is there a better way to handle this scenario? Or, can we create some sort of... wrapper for a Handler which can accept a Metadata Object to populate directly?

        Attachments

          Activity

            People

            • Assignee:
              tpalsulich Tyler Bui-Palsulich
              Reporter:
              tpalsulich Tyler Bui-Palsulich
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: