Uploaded image for project: 'Tika'
  1. Tika
  2. TIKA-2662

Add a streaming out option for the Json serialization

    Details

    • Type: Task
    • Status: Resolved
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.19, 2.0.0
    • Component/s: None
    • Labels:
      None

      Description

      Depending on the configuration of the ForkParser, it might be useful for that and also for tika-batch to write out each embedded file once the parse for that embedded file has completed, rather than caching the entire output in memory.

      The downside to this is that the main document will now show up at the bottom of the list of metadata objects. We can re-arrange when we deserialize, but anyone not using our deserialization will see this change in order. Given that this is a breaking change, I'll make it optional.

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              tallison@apache.org Tim Allison
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: