Details
-
Bug
-
Status: Open
-
Critical
-
Resolution: Unresolved
-
1.24.1
-
None
-
None
Description
We're using the Tika Server with OCR:
java -jar /opt/tika/tika-server-1.24.1.jar -p 9998 -spawnChild -JXmx500m
Two undersirable things happen:
1. The CPU runs at 100% for >10 minutes, long after any Tika requests have finished.
These processes show in top as "tesseract" (OCR) and consume all CPU cores at 100%.
They eventually die (or finish?) but the machine is unusable in the mean time.
Expected behaviour: Tika cleans up spawned processes after itself: at most after its timeout limit (which is 2 minutes I believe?)
2. The temp is full of files like:
root@38acd588ee22:/# ll /tmp/
total 197320
drwxrwxrwt 1 root root 24576 May 20 09:35 ./
drwxr-xr-x 1 root root 4096 May 20 08:40 ../
rw-rr- 1 root root 9273920 May 20 08:56 TIKA_streamstore_11144988934311367241.tmp
rw-rr- 1 root root 8938048 May 20 08:57 TIKA_streamstore_11649337406504198407.tmp
rw-rr- 1 root root 9478720 May 20 08:56 TIKA_streamstore_13551529918743702933.tmp
rw-rr- 1 root root 9151040 May 20 08:57 TIKA_streamstore_13568226047805501311.tmp
rw-rr- 1 root root 7701056 May 20 08:56 TIKA_streamstore_13908373602714189455.tmp
…
rw-rr- 1 root root 33367 May 20 08:55 apache-tika-11167866320029165062.tmp
rw-rr- 1 root root 44353 May 20 08:54 apache-tika-1152515137515755865.tmp
rw-rr- 1 root root 245279 May 20 08:52 apache-tika-12106368488659105236.tmp
rw-rr- 1 root root 1759 May 20 08:47 apache-tika-12291680472524021463.tmp
…
slowly filling up the disk.
Expected behaviour: Tika cleans up disk space after itself.
These bugs are critical for us. What's the best way to avoid them?