Uploaded image for project: 'Livy'
  1. Livy
  2. LIVY-541

Multiple Livy servers submitting to Yarn results in LivyException: Session is finished ... No YARN application is found with tag livy-session-197-uveqmqyj in 300 seconds. Please check your cluster status, it is may be very busy

Attach filesAttach ScreenshotAdd voteVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Open
    • Priority: Critical
    • Resolution: Unresolved
    • Affects Version/s: 0.5.0
    • Fix Version/s: None
    • Component/s: Server
    • Labels:
      None
    • Environment:
      Hortonworks HDP 2.6

      Description

      It appears Livy doesn't differentiate sessions properly in Yarn causing errors when running multiple Livy servers behind a load balancer for HA / performance scaling on the same Hadoop cluster.

      Each livy server uses monotonically incrementing session IDs with a random suffix but it appears that the random suffix isn't passed to Yarn which results in the following errors on the Livy server which is further behind in session numbers because it appears to find the session with the same number has already finished (submitted earlier by a different user on another Livy server as seen in Yarn RM UI):

      org.apache.zeppelin.livy.LivyException: Session 197 is finished, appId: null, log: [	at org.apache.hadoop.fs.FileSystem$Cache.closeAll(FileSystem.java:2887), at org.apache.hadoop.fs.FileSystem$Cache$ClientFinalizer.run(FileSystem.java:2904), at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511), at java.util.concurrent.FutureTask.run(FutureTask.java:266), at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142), at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617), at java.lang.Thread.run(Thread.java:748), 
      YARN Diagnostics: , java.lang.Exception: No YARN application is found with tag livy-session-197-uveqmqyj in 300 seconds. Please check your cluster status, it is may be very busy., org.apache.livy.utils.SparkYarnApp.org$apache$livy$utils$SparkYarnApp$$getAppIdFromTag(SparkYarnApp.scala:182) org.apache.livy.utils.SparkYarnApp$$anonfun$1$$anonfun$4.apply(SparkYarnApp.scala:239) org.apache.livy.utils.SparkYarnApp$$anonfun$1$$anonfun$4.apply(SparkYarnApp.scala:236) scala.Option.getOrElse(Option.scala:120) org.apache.livy.utils.SparkYarnApp$$anonfun$1.apply$mcV$sp(SparkYarnApp.scala:236) org.apache.livy.Utils$$anon$1.run(Utils.scala:94)]
      at org.apache.zeppelin.livy.BaseLivyInterpreter.createSession(BaseLivyInterpreter.java:300)
      at org.apache.zeppelin.livy.BaseLivyInterpreter.initLivySession(BaseLivyInterpreter.java:184)
      at org.apache.zeppelin.livy.LivySharedInterpreter.open(LivySharedInterpreter.java:57)
      at org.apache.zeppelin.interpreter.LazyOpenInterpreter.open(LazyOpenInterpreter.java:69)
      at org.apache.zeppelin.livy.BaseLivyInterpreter.getLivySharedInterpreter(BaseLivyInterpreter.java:165)
      at org.apache.zeppelin.livy.BaseLivyInterpreter.open(BaseLivyInterpreter.java:139)
      at org.apache.zeppelin.interpreter.LazyOpenInterpreter.open(LazyOpenInterpreter.java:69)
      at org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:493)
      at org.apache.zeppelin.scheduler.Job.run(Job.java:175)
      at org.apache.zeppelin.scheduler.FIFOScheduler$1.run(FIFOScheduler.java:139)
      at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
      at java.util.concurrent.FutureTask.run(FutureTask.java:266)
      at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
      at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
      at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
      at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
      at java.lang.Thread.run(Thread.java:748)
      

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              harisekhon Hari Sekhon

              Dates

              • Created:
                Updated:

                Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0h
                0h
                Logged:
                Time Spent - 20m
                20m

                  Issue deployment