Details
Description
In our application, Tika is used as part of a Tomcat webapp. Tomcat sets its temp folder ($CATALINA_HOME/temp) as "java.io.tmpdir". The MP4Parser creates files in java.io.tmpdir.
The files created by the MP4Parser are never deleted from temp/. Ex: MediaDataBox10544109451805035303org.mp4parser.boxes.iso14496.part12.MediaDataBox@77cb1ee8
Oddly, there are no errors in logs. Nothing about files that cannot be deleted or not found.
Other processes in our application needs to create other files in temp/, so we can't simply delete everything in that folder.
I assume from TIKA-1040, TIKA-1361, and TIKA-3084 that most file deletion issues in the MP4Parser have been fixed. This may be a little gremlin in CentOS or in Tomcat ... ?
I have tried using TemporaryResources (i.e.: replace the "TikaInputStream.get" in the code below by TikaInputStream.get(InputStream, TemporaryResources)) to put the parser's temporary files in a folder that we can control, but to no avail. Tika's MP4Parser "parse" method initializes a new instance of TemporaryResources, so the TemporaryResources that I created is never used. The default TemporaryResources would use java.io.tmpdir anyways, right?
So, why aren't these files deleted ?
And, while we are on the subject, there should be a way to set a temporary files folder that parsers actually use (and the parser's dependencies). How can a user-defined TemporaryResources be useful if the parser ignores it ?
Relevant code:
Parser parser = new AutoDetectParser(); // injected by Spring Path input = ...; // some mp4 audio file Path output = ...; final Metadata metadata = new Metadata(); try(InputStream stream = TikaInputStream.get(input, metadata); OutputStream outputstream = new FileOutputStream(output.toFile()); OutputStreamWriter outputStreamWriter = new OutputStreamWriter(outputstream, "UTF-8")){ ParseContext parseContext = new ParseContext(); parser.parse(stream, new BodyContentHandler(outputStreamWriter), metadata, parseContext); // do something with the metadata and the output }
Note that I also tried to set java.io.tmpdir to another folder, programmatically. That had no effect either. Since the application needs to use Tomcat's temp folder for other processing, setting java.io.tmpdir on the command line is not an option.