Details
-
Bug
-
Status: Open
-
Critical
-
Resolution: Unresolved
-
None
-
None
-
spark2 %spark2, %spark2.sql, %spark2.dep, %spark2.pyspark, %spark2.r
spark ui edit restart remove
Option
The interpreter will be instantiated Per User in isolated process.
User Impersonate
Connect to existing process
Set permission
Properties
name value SPARK_HOME /usr/hdp/current/spark2-client/ args master local[*] spark.app.name Zeppelin spark.cores.max spark.executor.memory zeppelin.R.cmd R zeppelin.R.image.width 100% zeppelin.R.knitr true zeppelin.R.render.options out.format = 'html', comment = NA, echo = FALSE, results = 'asis', message = F, warning = F zeppelin.dep.additionalRemoteRepository spark-packages,http://dl.bintray.com/spark-packages/maven,false; zeppelin.dep.localrepo local-repo zeppelin.interpreter.localRepo /usr/hdp/current/zeppelin-server/local-repo/2DRMGSB7A zeppelin.interpreter.output.limit 102400 zeppelin.pyspark.python /usr/local/bin/zsparkpy zeppelin.spark.concurrentSQL false zeppelin.spark.importImplicit true zeppelin.spark.maxResult 1000 zeppelin.spark.printREPLOutput true zeppelin.spark.sql.stacktrace false zeppelin.spark.useHiveContext true spark2 %spark2, %spark2.sql, %spark2.dep, %spark2.pyspark, %spark2.r spark ui edit restart remove Option The interpreter will be instantiated Per User in isolated process. User Impersonate Connect to existing process Set permission Properties name value SPARK_HOME /usr/hdp/current/spark2-client/ args master local [*] spark.app.name Zeppelin spark.cores.max spark.executor.memory zeppelin.R.cmd R zeppelin.R.image.width 100% zeppelin.R.knitr true zeppelin.R.render.options out.format = 'html', comment = NA, echo = FALSE, results = 'asis', message = F, warning = F zeppelin.dep.additionalRemoteRepository spark-packages, http://dl.bintray.com/spark-packages/maven,false ; zeppelin.dep.localrepo local-repo zeppelin.interpreter.localRepo /usr/hdp/current/zeppelin-server/local-repo/2DRMGSB7A zeppelin.interpreter.output.limit 102400 zeppelin.pyspark.python /usr/local/bin/zsparkpy zeppelin.spark.concurrentSQL false zeppelin.spark.importImplicit true zeppelin.spark.maxResult 1000 zeppelin.spark.printREPLOutput true zeppelin.spark.sql.stacktrace false zeppelin.spark.useHiveContext true
Description
This is Zeppelin installed as part of HDP 2.6.3.0-235
We have a Zeppelin system being used by a large class. Everything except MD is configured to run with user impersonation, isolated. Users primarily use spark2.
After a while the system becomes unusable. I've been restarting once a day, but today even that wasn't enough. Once the problem occurs we get this kind of error:
Restarting my interpreter doesn't help, and indeed I believe this happens to all users.
Livy2 still works.
Our system is kerberized. Users get Kerberos credentials when they login automatically (via PAM).
ERROR [2018-10-17 16:04:55,608] ({Thread-2817} RemoteInterpreterEventPoller.java[run]:113) - Can't get RemoteInterpreterEvent
org.apache.thrift.transport.TTransportException
at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132)
at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86)
at org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:429)
at org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:318)
at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:219)
at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:69)
at org.apache.zeppelin.interpreter.thrift.RemoteInterpreterService$Client.recv_getEvent(RemoteInterpreterService.java:429)
at org.apache.zeppelin.interpreter.thrift.RemoteInterpreterService$Client.getEvent(RemoteInterpreterService.java:417)
at org.apache.zeppelin.interpreter.remote.RemoteInterpreterEventPoller.run(RemoteInterpreterEventPoller.java:110)
ERROR [2018-10-17 16:04:55,620] ({Thread-2819} JobProgressPoller.java[run]:54) - Can not get or update progress
org.apache.zeppelin.interpreter.InterpreterException: org.apache.thrift.transport.TTransportException: java.net.SocketException: Connection reset
at org.apache.zeppelin.interpreter.remote.RemoteInterpreter.getProgress(RemoteInterpreter.java:500)
at org.apache.zeppelin.interpreter.LazyOpenInterpreter.getProgress(LazyOpenInterpreter.java:121)
at org.apache.zeppelin.notebook.Paragraph.progress(Paragraph.java:333)
at org.apache.zeppelin.scheduler.JobProgressPoller.run(JobProgressPoller.java:51)
Caused by: org.apache.thrift.transport.TTransportException: java.net.SocketException: Connection reset
at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:129)
at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86)
at org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:429)
at org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:318)
at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:219)
at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:69)
at org.apache.zeppelin.interpreter.thrift.RemoteInterpreterService$Client.recv_getProgress(RemoteInterpreterService.java:313)
at org.apache.zeppelin.interpreter.thrift.RemoteInterpreterService$Client.getProgress(RemoteInterpreterService.java:298)
at org.apache.zeppelin.interpreter.remote.RemoteInterpreter.getProgress(RemoteInterpreter.java:497)
... 3 more
Caused by: java.net.SocketException: Connection reset
at java.net.SocketInputStream.read(SocketInputStream.java:209)
at java.net.SocketInputStream.read(SocketInputStream.java:141)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
at java.io.BufferedInputStream.read1(BufferedInputStream.java:286)
at java.io.BufferedInputStream.read(BufferedInputStream.java:345)
at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:127)
... 11 more
ERROR [2018-10-17 16:04:55,618] ({pool-2-thread-37} Job.java[run]:188) - Job failed
org.apache.zeppelin.interpreter.InterpreterException: org.apache.thrift.transport.TTransportException
at org.apache.zeppelin.interpreter.remote.RemoteInterpreter.interpret(RemoteInterpreter.java:426)
at org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:101)
at org.apache.zeppelin.notebook.Paragraph.jobRun(Paragraph.java:410)
at org.apache.zeppelin.scheduler.Job.run(Job.java:175)
at org.apache.zeppelin.scheduler.RemoteScheduler$JobRunner.run(RemoteScheduler.java:329)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.thrift.transport.TTransportException
at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132)
at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86)
at org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:429)
at org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:318)
at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:219)
at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:69)
at org.apache.zeppelin.interpreter.thrift.RemoteInterpreterService$Client.recv_interpret(RemoteInterpreterService.java:266)
at org.apache.zeppelin.interpreter.thrift.RemoteInterpreterService$Client.interpret(RemoteInterpreterService.java:250)
at org.apache.zeppelin.interpreter.remote.RemoteInterpreter.interpret(RemoteInterpreter.java:398)
... 11 more
ERROR [2018-10-17 16:04:55,625] ({pool-2-thread-37} RemoteScheduler.java[getStatus]:256) - Can't get status information
org.apache.zeppelin.interpreter.InterpreterException: org.apache.thrift.transport.TTransportException: java.net.ConnectException: Connection refu\
sed (Connection refused)
I think at this point it's repeating.