Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
0.8.0
-
None
Description
We have configured to Timeout our interpreters after 60 minutes. From time to time an interpreter is not closed properly. The remote interpreter process is still alive. This behavior is non-deterministic.
When timeout is reached only the following is logged:
INFO [2018-04-27 13:06:44,329] ({Timer-0} TimeoutLifecycleManager.java[run]:49) - InterpreterGroup spark:shared_process is timeout. INFO [2018-04-27 13:06:44,329] ({Timer-0} ManagedInterpreterGroup.java[close]:89) - Close InterpreterGroup: spark:shared_process INFO [2018-04-27 13:06:44,329] ({Timer-0} ManagedInterpreterGroup.java[close]:100) - Close Session: 2D8VRV5M6 for interpreter setting: spark WARN [2018-04-27 13:06:44,329] ({Timer-0} RemoteInterpreter.java[close]:199) - close is called when RemoterInterpreter is not opened for org.apache.zeppelin. spark.SparkInterpreter WARN [2018-04-27 13:06:44,330] ({Timer-0} RemoteInterpreter.java[close]:199) - close is called when RemoterInterpreter is not opened for org.apache.zeppelin. spark.SparkSqlInterpreter WARN [2018-04-27 13:06:44,330] ({Timer-0} RemoteInterpreter.java[close]:199) - close is called when RemoterInterpreter is not opened for org.apache.zeppelin. spark.DepInterpreter WARN [2018-04-27 13:06:44,330] ({Timer-0} RemoteInterpreter.java[close]:199) - close is called when RemoterInterpreter is not opened for org.apache.zeppelin. spark.PySparkInterpreter WARN [2018-04-27 13:06:44,330] ({Timer-0} RemoteInterpreter.java[close]:199) - close is called when RemoterInterpreter is not opened for org.apache.zeppelin. spark.IPySparkInterpreter WARN [2018-04-27 13:06:44,330] ({Timer-0} RemoteInterpreter.java[close]:199) - close is called when RemoterInterpreter is not opened for org.apache.zeppelin. spark.SparkRInterpreter INFO [2018-04-27 13:06:44,330] ({Timer-0} ManagedInterpreterGroup.java[close]:105) - Remove this InterpreterGroup: spark:shared_process as all the sessions are closed
For successful shutdown situation we also see those log entries, but they are missing in the case of this bug:
... INFO [2018-04-27 13:11:20,485] ({Timer-0} ManagedInterpreterGroup.java[close]:108) - Kill RemoteInterpreterProcess INFO [2018-04-27 13:11:20,485] ({Timer-0} RemoteInterpreterManagedProcess.java[stop]:220) - Kill interpreter process ERROR [2018-04-27 13:11:20,692] ({Thread-71907} RemoteInterpreterEventPoller.java[run]:257) - Can not get RemoteInterpreterEvent because it is shutdown. ERROR [2018-04-27 13:11:20,692] ({pool-30-thread-1} AppendOutputRunner.java[run]:68) - Wait for OutputBuffer queue interrupted: null WARN [2018-04-27 13:11:22,991] ({Timer-0} RemoteInterpreterManagedProcess.java[stop]:230) - ignore the exception when shutting down INFO [2018-04-27 13:11:22,993] ({Timer-0} RemoteInterpreterManagedProcess.java[stop]:238) - Remote process terminated
So in case of the Bug line 108 of ManagedInterpreterGroup is never reached.
When triggering a notebook after the timeout has occured, a new additional interpreter gets started and the first one stays alive forever.
Also restart the interpreter does not kill the first process.
Only after restarting zeppelin, all interpreter process orphans are killed.
Attachments
Issue Links
- is related to
-
ZEPPELIN-2385 Release 0.8.0
- Resolved
- links to