Uploaded image for project: 'Zeppelin'
  1. Zeppelin
  2. ZEPPELIN-3126

More than 2 notebooks in R failing with error sparkr intrepreter not responding

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Critical
    • Resolution: Invalid
    • 0.7.2
    • None
    • r-interpreter
    • None
    • spark version 1.6.2

    Description

      Spark interpreter is in per note Scoped mode.
      Please find the steps below to reproduce the issue:
      1. Create a notebook (Note1) and run any r code in a paragraph. I ran the following code.
      %r
      rdf <- data.frame(c(1,2,3,4))
      colnames(rdf) <- c("myCol")
      sdf <- createDataFrame(sqlContext, rdf)
      withColumn(sdf, "newCol", sdf$myCol * 2.0)

      2. Create another notebook (Note2) and run any r code in a paragraph. I ran the same code as above.

      Till now everything works fine.

      3. Create third notebook (Note3) and run any r code in a paragraph. I ran the same code. This notebook fails with the error
      org.apache.zeppelin.interpreter.InterpreterException: sparkr is not responding

      The problem will be solved on restarting the sparkr interpreter and another 2 models could be executed successfully. But again, for the third model run using the sparkr interpreter, the error is thrown.
      Once a notebook throws the error, all further notebooks will throw the same error and each time we run those failed notebooks, a new R shell process will be started and these processes are not getting killed even if we we delete the failed notebook.i.e it does not reuse original R shell after failure.

      Interpreter log is as below:
      INFO [2018-01-03 12:10:05,960] (

      {pool-2-thread-9} Logging.scala[logInfo]:58) - Starting HTTP Server
      INFO [2018-01-03 12:10:05,961] ({pool-2-thread-9}

      Server.java[doStart]:272) - jetty-8.y.z-SNAPSHOT
      INFO [2018-01-03 12:10:05,963] (

      {pool-2-thread-9} AbstractConnector.java[doStart]:338) - Started SocketConnector@0.0.0.0:58989
      INFO [2018-01-03 12:10:05,963] ({pool-2-thread-9}

      Logging.scala[logInfo]:58) - Successfully started service 'HTTP class server' on port 58989.
      INFO [2018-01-03 12:10:06,094] (

      {dispatcher-event-loop-1}

      Logging.scala[logInfo]:58) - Removed broadcast_1_piece0 on localhost:42453 in memory (size: 854.0 B, free: 511.1 MB)
      INFO [2018-01-03 12:10:07,049] (

      {pool-2-thread-9} ZeppelinR.java[createRScript]:353) - File /tmp/zeppelin_sparkr-5046601627391341672.R created
      ERROR [2018-01-03 12:10:17,051] ({pool-2-thread-9}

      Job.java[run]:188) - Job failed
      org.apache.zeppelin.interpreter.InterpreterException: sparkr is not responding

      R version 3.4.1 (2017-06-30) – "Single Candle"
      Copyright (C) 2017 The R Foundation for Statistical Computing
      Platform: x86_64-pc-linux-gnu (64-bit)
      ....
      ....
      > args <- commandArgs(trailingOnly = TRUE)
      > hashCode <- as.integer(args[1])
      > port <- as.integer(args[2])
      > libPath <- args[3]
      > version <- as.integer(args[4])
      > rm(args)
      > print(paste("Port ", toString(port)))
      [1]
      "Port 58063"
      > print(paste("LibPath ", libPath))
      [1]
      "LibPath /home/meethu/spark-1.6.1-bin-hadoop2.6/R/lib"
      > .libPaths(c(file.path(libPath), .libPaths()))
      > library(SparkR)
      Attaching package: ‘SparkR’

      The following objects are masked from ‘package:stats’:

      cov, filter, lag, na.omit, predict, sd, var

      The following objects are masked from ‘package:base’:
      colnames, colnames<-, endsWith, intersect, rank, rbind, sample,
      startsWith, subset, summary, table, transform

      > SparkR:::connectBackend("localhost", port, 6000)
      A connection with
      description "->localhost:58063"
      class
      "sockconn"
      mode "wb"
      text "binary"
      opened "opened"
      can read "yes"
      can write "yes"

      > # scStartTime is needed by R/pkg/R/sparkR.R
      > assign(".scStartTime", as.integer(Sys.time()), envir = SparkR:::.sparkREnv)
      > # getZeppelinR
      > .zeppelinR = SparkR:::callJStatic("org.apache.zeppelin.spark.ZeppelinR", "getZeppelinR", hashCode)
      at org.apache.zeppelin.spark.ZeppelinR.waitForRScriptInitialized(ZeppelinR.java:285)
      at org.apache.zeppelin.spark.ZeppelinR.request(ZeppelinR.java:227)
      at org.apache.zeppelin.spark.ZeppelinR.eval(ZeppelinR.java:176)
      at org.apache.zeppelin.spark.ZeppelinR.open(ZeppelinR.java:165)
      at org.apache.zeppelin.spark.SparkRInterpreter.open(SparkRInterpreter.java:90)
      at org.apache.zeppelin.interpreter.LazyOpenInterpreter.open(LazyOpenInterpreter.java:70)
      at org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:491)
      at org.apache.zeppelin.scheduler.Job.run(Job.java:175)
      at org.apache.zeppelin.scheduler.FIFOScheduler$1.run(FIFOScheduler.java:139)
      at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
      at java.util.concurrent.FutureTask.run(FutureTask.java:266)
      at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
      at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
      at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
      at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
      at java.lang.Thread.run(Thread.java:745)
      INFO [2018-01-03 12:10:17,070] (

      {pool-2-thread-9}

      SchedulerFactory.java[jobFinished]:137) - Job remoteInterpretJob_1514961605951 finished by scheduler org.apache.zeppelin.spark.SparkRInterpreter392022746
      INFO [2018-01-03 12:39:22,664] (

      {Spark Context Cleaner}

      Logging.scala[logInfo]:58) - Cleaned accumulator 2

      Attachments

        Activity

          People

            Unassigned Unassigned
            MeethuMathew Meethu Mathew
            Votes:
            1 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: