[ZEPPELIN-3126] More than 2 notebooks in R failing with error sparkr intrepreter not responding - ASF JIRA

Details

Type: Bug
Status: Resolved
Priority: Critical
Resolution: Invalid
Affects Version/s: 0.7.2
Fix Version/s: None
Component/s: r-interpreter
Labels:
None
Environment:

spark version 1.6.2

Description

Spark interpreter is in per note Scoped mode.
Please find the steps below to reproduce the issue:
1. Create a notebook (Note1) and run any r code in a paragraph. I ran the following code.
%r
rdf <- data.frame(c(1,2,3,4))
colnames(rdf) <- c("myCol")
sdf <- createDataFrame(sqlContext, rdf)
withColumn(sdf, "newCol", sdf$myCol * 2.0)

2. Create another notebook (Note2) and run any r code in a paragraph. I ran the same code as above.

Till now everything works fine.

3. Create third notebook (Note3) and run any r code in a paragraph. I ran the same code. This notebook fails with the error
org.apache.zeppelin.interpreter.InterpreterException: sparkr is not responding

The problem will be solved on restarting the sparkr interpreter and another 2 models could be executed successfully. But again, for the third model run using the sparkr interpreter, the error is thrown.
Once a notebook throws the error, all further notebooks will throw the same error and each time we run those failed notebooks, a new R shell process will be started and these processes are not getting killed even if we we delete the failed notebook.i.e it does not reuse original R shell after failure.

Interpreter log is as below:
INFO [2018-01-03 12:10:05,960] (

{pool-2-thread-9} Logging.scala[logInfo]:58) - Starting HTTP Server
INFO [2018-01-03 12:10:05,961] ({pool-2-thread-9}

Server.java[doStart]:272) - jetty-8.y.z-SNAPSHOT
INFO [2018-01-03 12:10:05,963] (

{pool-2-thread-9} AbstractConnector.java[doStart]:338) - Started SocketConnector@0.0.0.0:58989
INFO [2018-01-03 12:10:05,963] ({pool-2-thread-9}

Logging.scala[logInfo]:58) - Successfully started service 'HTTP class server' on port 58989.
INFO [2018-01-03 12:10:06,094] (

{dispatcher-event-loop-1}

Logging.scala[logInfo]:58) - Removed broadcast_1_piece0 on localhost:42453 in memory (size: 854.0 B, free: 511.1 MB)
INFO [2018-01-03 12:10:07,049] (

{pool-2-thread-9} ZeppelinR.java[createRScript]:353) - File /tmp/zeppelin_sparkr-5046601627391341672.R created
ERROR [2018-01-03 12:10:17,051] ({pool-2-thread-9}

Job.java[run]:188) - Job failed
org.apache.zeppelin.interpreter.InterpreterException: sparkr is not responding

R version 3.4.1 (2017-06-30) – "Single Candle"
Copyright (C) 2017 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)
....
....
> args <- commandArgs(trailingOnly = TRUE)
> hashCode <- as.integer(args[1])
> port <- as.integer(args[2])
> libPath <- args[3]
> version <- as.integer(args[4])
> rm(args)
> print(paste("Port ", toString(port)))
[1]
"Port 58063"
> print(paste("LibPath ", libPath))
[1]
"LibPath /home/meethu/spark-1.6.1-bin-hadoop2.6/R/lib"
> .libPaths(c(file.path(libPath), .libPaths()))
> library(SparkR)
Attaching package: ‘SparkR’

The following objects are masked from ‘package:stats’:

cov, filter, lag, na.omit, predict, sd, var

The following objects are masked from ‘package:base’:
colnames, colnames<-, endsWith, intersect, rank, rbind, sample,
startsWith, subset, summary, table, transform

> SparkR:::connectBackend("localhost", port, 6000)
A connection with
description "->localhost:58063"
class
"sockconn"
mode "wb"
text "binary"
opened "opened"
can read "yes"
can write "yes"

> # scStartTime is needed by R/pkg/R/sparkR.R
> assign(".scStartTime", as.integer(Sys.time()), envir = SparkR:::.sparkREnv)
> # getZeppelinR
> .zeppelinR = SparkR:::callJStatic("org.apache.zeppelin.spark.ZeppelinR", "getZeppelinR", hashCode)
at org.apache.zeppelin.spark.ZeppelinR.waitForRScriptInitialized(ZeppelinR.java:285)
at org.apache.zeppelin.spark.ZeppelinR.request(ZeppelinR.java:227)
at org.apache.zeppelin.spark.ZeppelinR.eval(ZeppelinR.java:176)
at org.apache.zeppelin.spark.ZeppelinR.open(ZeppelinR.java:165)
at org.apache.zeppelin.spark.SparkRInterpreter.open(SparkRInterpreter.java:90)
at org.apache.zeppelin.interpreter.LazyOpenInterpreter.open(LazyOpenInterpreter.java:70)
at org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:491)
at org.apache.zeppelin.scheduler.Job.run(Job.java:175)
at org.apache.zeppelin.scheduler.FIFOScheduler$1.run(FIFOScheduler.java:139)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
INFO [2018-01-03 12:10:17,070] (

{pool-2-thread-9}

SchedulerFactory.java[jobFinished]:137) - Job remoteInterpretJob_1514961605951 finished by scheduler org.apache.zeppelin.spark.SparkRInterpreter392022746
INFO [2018-01-03 12:39:22,664] (

{Spark Context Cleaner}

Logging.scala[logInfo]:58) - Cleaned accumulator 2

More than 2 notebooks in R failing with error sparkr intrepreter not responding

Details

Description

Attachments

Activity

People

Dates