Uploaded image for project: 'Zeppelin'
  1. Zeppelin
  2. ZEPPELIN-3816

after moderate usage, can no longer use Spark2

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Open
    • Priority: Critical
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: 0.7.3
    • Component/s: zeppelin-interpreter
    • Labels:
      None
    • Environment:

      Description

      This is Zeppelin installed as part of HDP 2.6.3.0-235

      We have a Zeppelin system being used by a large class. Everything except MD is configured to run with user impersonation, isolated. Users primarily use spark2.

      After a while the system becomes unusable. I've been restarting once a day, but today even that wasn't enough. Once the problem occurs we get this kind of error:

      Restarting my interpreter doesn't help, and indeed I believe this happens to all users.

      Livy2 still works.

      Our system is kerberized. Users get Kerberos credentials when they login automatically (via PAM).

      ERROR [2018-10-17 16:04:55,608] ({Thread-2817} RemoteInterpreterEventPoller.java[run]:113) - Can't get RemoteInterpreterEvent
      org.apache.thrift.transport.TTransportException
      at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132)
      at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86)
      at org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:429)
      at org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:318)
      at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:219)
      at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:69)
      at org.apache.zeppelin.interpreter.thrift.RemoteInterpreterService$Client.recv_getEvent(RemoteInterpreterService.java:429)
      at org.apache.zeppelin.interpreter.thrift.RemoteInterpreterService$Client.getEvent(RemoteInterpreterService.java:417)
      at org.apache.zeppelin.interpreter.remote.RemoteInterpreterEventPoller.run(RemoteInterpreterEventPoller.java:110)
      ERROR [2018-10-17 16:04:55,620] ({Thread-2819} JobProgressPoller.java[run]:54) - Can not get or update progress
      org.apache.zeppelin.interpreter.InterpreterException: org.apache.thrift.transport.TTransportException: java.net.SocketException: Connection reset

      at org.apache.zeppelin.interpreter.remote.RemoteInterpreter.getProgress(RemoteInterpreter.java:500)
      at org.apache.zeppelin.interpreter.LazyOpenInterpreter.getProgress(LazyOpenInterpreter.java:121)
      at org.apache.zeppelin.notebook.Paragraph.progress(Paragraph.java:333)
      at org.apache.zeppelin.scheduler.JobProgressPoller.run(JobProgressPoller.java:51)
      Caused by: org.apache.thrift.transport.TTransportException: java.net.SocketException: Connection reset
      at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:129)
      at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86)
      at org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:429)
      at org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:318)
      at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:219)
      at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:69)
      at org.apache.zeppelin.interpreter.thrift.RemoteInterpreterService$Client.recv_getProgress(RemoteInterpreterService.java:313)
      at org.apache.zeppelin.interpreter.thrift.RemoteInterpreterService$Client.getProgress(RemoteInterpreterService.java:298)
      at org.apache.zeppelin.interpreter.remote.RemoteInterpreter.getProgress(RemoteInterpreter.java:497)
      ... 3 more

      Caused by: java.net.SocketException: Connection reset

              at java.net.SocketInputStream.read(SocketInputStream.java:209)

              at java.net.SocketInputStream.read(SocketInputStream.java:141)

              at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)

              at java.io.BufferedInputStream.read1(BufferedInputStream.java:286)

              at java.io.BufferedInputStream.read(BufferedInputStream.java:345)

              at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:127)

              ... 11 more

      ERROR [2018-10-17 16:04:55,618] ({pool-2-thread-37} Job.java[run]:188) - Job failed

      org.apache.zeppelin.interpreter.InterpreterException: org.apache.thrift.transport.TTransportException

              at org.apache.zeppelin.interpreter.remote.RemoteInterpreter.interpret(RemoteInterpreter.java:426)

              at org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:101)

              at org.apache.zeppelin.notebook.Paragraph.jobRun(Paragraph.java:410)

              at org.apache.zeppelin.scheduler.Job.run(Job.java:175)

              at org.apache.zeppelin.scheduler.RemoteScheduler$JobRunner.run(RemoteScheduler.java:329)

              at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)

              at java.util.concurrent.FutureTask.run(FutureTask.java:266)

              at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)

              at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)

              at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)

              at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)

              at java.lang.Thread.run(Thread.java:745)

      Caused by: org.apache.thrift.transport.TTransportException

              at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132)

              at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86)

              at org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:429)

              at org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:318)

              at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:219)

              at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:69)

              at org.apache.zeppelin.interpreter.thrift.RemoteInterpreterService$Client.recv_interpret(RemoteInterpreterService.java:266)

              at org.apache.zeppelin.interpreter.thrift.RemoteInterpreterService$Client.interpret(RemoteInterpreterService.java:250)

              at org.apache.zeppelin.interpreter.remote.RemoteInterpreter.interpret(RemoteInterpreter.java:398)

              ... 11 more

      ERROR [2018-10-17 16:04:55,625] ({pool-2-thread-37} RemoteScheduler.java[getStatus]:256) - Can't get status information

      org.apache.zeppelin.interpreter.InterpreterException: org.apache.thrift.transport.TTransportException: java.net.ConnectException: Connection refu\

      sed (Connection refused)

      I think at this point it's repeating.

       

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              clhedrick Charles Hedrick
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated: