When converting Word97 documents (*.doc), tika-server reproducibly leaves behind temporary files.
Steps to reproduce:
- Start tika-app-1.5.jar in --server mode
- Send a *.doc file to server for conversion
- Stop tika-server using CTRL+C or kill -15
No exceptions are thrown, and the plaintext is being extracted correctly from the document, but temporary files are still left behind every single time.
This obviously is a major issue in a production environment when converting thousands of documents a day. Our temp directories are filling up rapidly, and we had to configure cron jobs to clean up after Tika on most of our production servers. I wasn't able to reproduce this issue using tika-app-1.5.jar in non-server mode. However, booting up a JVM for every single conversion is just too slow.