Details
-
Bug
-
Status: Reopened
-
Major
-
Resolution: Unresolved
-
0.8.2, 0.9.0
-
None
-
None
-
None
Description
We use zeppelin with 10-20 users working primarily in spark. Every few days, and sometimes multiple times per day, the zeppelin webui becomes unresponsive and the only solution we have found is to restart zeppelin. This is extremely disruptive.
"Unresponsive" usually takes the form of no longer being able to create new paragraphs, clicking run no longer doing anything or being stuck forever in pending, inability to create new notebooks, or the inability to load notebooks.
We have tried adding monitoring to the box zeppelin runs on and see nothing out of the ordinary with: GC rates, CPU utilizations, Memory usage, and heap utilization
We also don't see anything unusual in the logs. Is there any other way we can diagnose this issue to help find the root cause. 0.9 is currently too broken to use (based on a build using the live code on 1/27/2020 and again on 2/3/2020 )
Attaching a copy of logs JIC.