Uploaded image for project: 'Livy'
  1. Livy
  2. LIVY-324

RSCClient can loose its session/interpreter reference leaking job sessions

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 0.3
    • Fix Version/s: None
    • Component/s: REPL, RSC
    • Labels:
      None

      Description

      Seeing an issue where Livy seems unable to kill a PySpark session due to disconnects with its session's ProcessInterpreter.

      User observation, using Hue Notebooks to launch a PySpark session, session is initiated and goes to running. User's job has errors and goes to idle but session remains running for 24+ hours. Usually we see an idle session automatically killed after 1 hour.

      In Yarn task log, we see the AM start ok and SparkContext comes up, user's job runs with errors and SparkContext goes to idle, the Yarn job then stays idle for 1 hour at which point PythonInterpreter calls shutdown;

      INFO PythonInterpreter: Shutting down process

      Nothing more is seen in the Yarn log, Yarn job remains running.

      In Livy log we see the following timeout exception when trying the shutdown:

      INFO com.cloudera.livy.Logging$class.info(40): Stopping InteractiveSession 0...
      WARN com.cloudera.livy.rsc.RSCClient.stop(220): Exception while waiting for end session reply.
      java.util.concurrent.TimeoutException

      The Livy call trace looks like it is trying:
      -> repl/ProcessInterpreter.scala close() - Yarn log showing "Shutting down process"
      -> repl/PythonInterpreter.scala sendShutdownRequest()
      -> livy/server/interactive/InteractiveSession.scala stopSession()
      -> livy/rsc/RSCClient.java stop() - the client getting the timeout error:
      livy.rsc.RSCClient.stop(220): Exception while waiting for end session reply

      Not sure what happened, it appears that the client lost its reference to its session/ProcessInterpreter and can no longer complete a session close attempt.

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              patwhite Pat White
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated: