Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
1.20
-
None
-
None
Description
When doing "tika.detect(stream, name)" and the stream is a "TikaInputStream", execution gets to "TikaInputStream#getPath" which does a "Files.copy(in, path, REPLACE_EXISTING);" which is very, very bad. This input stream could be, as in our case, an input stream from a network file which is tens or hundreds of gigabytes large. Copying it locally is a huge waste of resources to say the least. Why does it do that and can I make it not do it? Or is this something that has to be fixed in Tika?
Attachments
Issue Links
- is related to
-
TIKA-2853 Consider applying NaiveBayes or similar simple ML to streaming zip detector
- Open