Uploaded image for project: 'Zeppelin'
  1. Zeppelin
  2. ZEPPELIN-4973

Zeppelin spark jobs are getting hung and return with different errors each time.

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 0.8.2
    • 0.8.2
    • Interpreters, spark
    • None
    • Important

    Description

      Hi,Hi,
      I've kerberized cluster and my kerberos ticket is renewed each day providing me the valid key. When I run spark job from my zeppelin IDE, it first gets stuck for 2.5-3 hours and after that I get an error mentioned below.

      GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt) at sun.security.jgss.krb5.Krb5InitCredential.getInstance(Krb5InitCredential.java:147) at sun.security.jgss.krb5.Krb5MechFactory.getCredentialElement(Krb5MechFactory.java:122) at sun.security.jgss.krb5.Krb5MechFactory.getMechanismContext(Krb5MechFactory.java:187) at sun.security.jgss.GSSManagerImpl.getMechanismContext(GSSManagerImpl.java:224) at sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:212) at sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:179) at com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:192) at org.apache.hadoop.security.SaslRpcClient.saslConnect(SaslRpcClient.java:413) at org.apache.hadoop.ipc.Client$Connection.setupSaslConnection(Client.java:594) at org.apache.hadoop.ipc.Client$Connection.access$2000(Client.java:396) at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:761) at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:757) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at 

       

      I've enabled the user impersonation in zeppelin that's why zeppelin keytab and principals are being submitted to spark interpreter by properties: zeppelin.spark.keytab and zeppelin.spark.principal.

      It's strange that this is the persistent error but sometimes out of the blue I get error mentioned below after 2.5 to 3 hours:

      java.lang.NullPointerException at org.apache.thrift.transport.TSocket.open(TSocket.java:170) at org.apache.zeppelin.interpreter.remote.ClientFactory.create(ClientFactory.java:51) at org.apache.zeppelin.interpreter.remote.ClientFactory.create(ClientFactory.java:37) at org.apache.commons.pool2.BasePooledObjectFactory.makeObject(BasePooledObjectFactory.java:60) at org.apache.commons.pool2.impl.GenericObjectPool.create(GenericObjectPool.java:861)

      Attachments

        1. zeppelin_sparkjob.PNG
          5 kB
          Divya Goel
        2. zeppelin_error.PNG
          45 kB
          Divya Goel

        Activity

          People

            Unassigned Unassigned
            Zcode Divya Goel
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:

              Time Tracking

                Estimated:
                Original Estimate - 24h
                24h
                Remaining:
                Remaining Estimate - 24h
                24h
                Logged:
                Time Spent - Not Specified
                Not Specified