Uploaded image for project: 'Tika'
  1. Tika
  2. TIKA-2585

TikaInputStream support for resetting via a factory of InputStreams

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 2.0, 1.17
    • Fix Version/s: 1.18
    • Component/s: parser
    • Labels:
      None

      Description

      As raised in the 2.0 breaking changes thread, currently the only way that Tika has of handling the need to fully read an InputStream multiple times is to use TikaInputStream.getFile() which will spool to a temp file if not already file-based. (Reading a few kb is handled via buffering and mark/reset, but that doesn't scale for huge full files)

      In some cases, grabbing a fresh InputStream is actually cheaper than Tika spooling to a temp file, but we've no way of a caller expressing that

      So, before we make too much extra use of re-processing the whole input several times (eg for the augmenting-parsers and fallback-parsers), we should provide a way for callers to instead supply new InputStream instances on demand

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              gagravarr Nick Burch
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: