Currently, tika-server is vulnerable to ooms, inifinite loops and memory leaks. I see two ways of making it robust:
1) use the ForkParser
2) have tika-server spawn a child process that actually runs the server, put a watcher thread in the child that will kill the child on oom/timeout/after x files. The parent process can then restart the child if it dies.
I somewhat prefer 2) so that we don't have to doubly pass the inputstream. I propose 2), and I propose making it optional in Tika 1.x, but then the default in Tika 2.x. We could also add a status ping from parent to child in case the child gets caught up in stop the world gc (h/t Boaz Leskes).