Uploaded image for project: 'Zeppelin'
  1. Zeppelin
  2. ZEPPELIN-4599

Zeppelin becomes unresponsive and can only be recovered by restart

Attach filesAttach ScreenshotAdd voteVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Reopened
    • Major
    • Resolution: Unresolved
    • 0.8.2, 0.9.0
    • None
    • None
    • None

    Description

      We use zeppelin with 10-20 users working primarily in spark. Every few days, and sometimes multiple times per day, the zeppelin webui becomes unresponsive and the only solution we have found is to restart zeppelin. This is extremely disruptive. 

      "Unresponsive" usually takes the form of no longer being able to create new paragraphs, clicking run no longer doing anything or being stuck forever in pending, inability to create new notebooks, or the inability to load notebooks.

      We have tried adding monitoring to the box zeppelin runs on and see nothing out of the ordinary with: GC rates, CPU utilizations, Memory usage, and heap utilization

      We also don't see anything unusual in the logs. Is there any other way we can diagnose this issue to help find the root cause. 0.9 is currently too broken to use (based on a build using the live code on 1/27/2020 and again on 2/3/2020 )

       

      Attaching a copy of logs JIC.

      Attachments

        1. zeppelin-yarn-zeppelin-210.sec.placeiq.net.out
          1.23 MB
          Paul Brenner
        2. zeppelin-yarn-zeppelin-210.sec.placeiq.net.log
          1.39 MB
          Paul Brenner
        3. zeppelin-server-jstack.log
          252 kB
          Paul Brenner
        4. example-interpreter-process-jstack.log
          114 kB
          Paul Brenner
        5. zeppelin9jstack log.out
          63 kB
          Paul Brenner
        6. jstack.log
          300 kB
          Sha Xia
        7. image-2021-02-20-17-13-14-559.png
          100 kB
          Sha Xia
        8. image-2021-02-20-17-13-14-559.png
          100 kB
          Sha Xia
        9. zeppelin_server.log
          12.97 MB
          Sha Xia
        10. zeppelin_server_13_34.log
          1.35 MB
          Sha Xia
        11. zeppelin212-2021-03-23.out
          44 kB
          Paul Brenner
        12. zeppelin212-2021-03-26.out
          238 kB
          Paul Brenner

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            Unassigned Unassigned
            pbrenner Paul Brenner

            Dates

              Created:
              Updated:

              Slack

                Issue deployment