Description
During development I recognize many shutdown errors of remote interpreters.
2021-01-25T10:43:33.2749004Z WARN [2021-01-25 10:43:33,274] ({Exec Default Executor} ProcessLauncher.java[onProcessFailed]:134) - Process with cmd [/home/runner/work/zeppelin/zeppelin/zeppelin-zengine/../bin/interpreter.sh, -d, /home/runner/work/zeppelin/zeppelin/zeppelin-zengine/../interpreter_NotebookTest/test, -c, 10.1.0.4, -p, 40207, -r, :, -i, test-isolated-2FYUBYUH2-2021-01-25_10-43-31, -l, /home/runner/work/zeppelin/zeppelin/zeppelin-zengine/../local-repo/test, -g, test] is failed due to 2021-01-25T10:43:33.2755177Z org.apache.commons.exec.ExecuteException: Process exited with an error: 143 (Exit value: 143) 2021-01-25T10:43:33.2757145Z at org.apache.commons.exec.DefaultExecutor.executeInternal(DefaultExecutor.java:404) 2021-01-25T10:43:33.2759258Z at org.apache.commons.exec.DefaultExecutor.access$200(DefaultExecutor.java:48) 2021-01-25T10:43:33.2760971Z at org.apache.commons.exec.DefaultExecutor$1.run(DefaultExecutor.java:200) 2021-01-25T10:43:33.2762144Z at java.lang.Thread.run(Thread.java:748)
Zeppelin server does not wait for a clean shutdown of the remote interpreter, but stops the process hard. The relevant code is located in RemoteInterpreterManagedProcess.
We should also abstract the RemoteInterpreterManagedProcess class and move the exec code to a new class, because the RemoteInterpreterManagedProcess class contains a lot of code that is only necessary when the Zeppelin server controls a remote interpreter via exec.
In the meantime, we have many remote interpreter processes that are started by API calls to a cluster manager (e.g. K8s, YARN, Docker) but cannot use the code from the RemoteInterpreterManagedProcess class.