Uploaded image for project: 'Tika'
  1. Tika
  2. TIKA-2849

TikaInputStream copies the input stream locally

Agile BoardAttach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 1.20
    • 1.21
    • None
    • None

    Description

      When doing "tika.detect(stream, name)" and the stream is a "TikaInputStream", execution gets to "TikaInputStream#getPath" which does a "Files.copy(in, path, REPLACE_EXISTING);" which is very, very bad. This input stream could be, as in our case, an input stream from a network file which is tens or hundreds of gigabytes large. Copying it locally is a huge waste of resources to say the least. Why does it do that and can I make it not do it? Or is this something that has to be fixed in Tika?

      Attachments

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            tallison Tim Allison
            boris-petrov Boris Petrov
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment