Uploaded image for project: 'Tika'
  1. Tika
  2. TIKA-3352

Add a handler for json output from the /tika endpoint

    XMLWordPrintableJSON

Details

    • Task
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 1.27
    • None
    • None

    Description

      I've been focusing mostly on the /rmeta endpoint. However, for many users who aren't enthusiasts of the wild and crazy things that can happen with embedded files (e.g., the rest of the world), it would be useful to have some of the advantages of the /rmeta endpoint without the complexity.

      This would allow text + metadata in the response (for those who don't want to parse the xhtml). It would include "late metadata", that is metadata that is only added after the content extraction has begun, which does not appear in our usual xhtml output. This would enable storing the stacktrace (if the s/-stackTrace commandline option is selected) in a field (as is done in /rmeta) so that users would get what they could from a failed parse and be able to align parse exceptions with the detected mime type.

      Unlike /rmeta, this proposal would not include stacktraces from embedded files.

      Attachments

        Activity

          People

            Unassigned Unassigned
            tallison Tim Allison
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: