Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
0.7.3
-
None
-
None
Description
Scenario:
36 hour long test
14 node secured encrypted cluster (centos7 based)
simulated load of around 13 users running a set of 19 notebooks periodically as per defined schedule
After 24 hours zeppelin stopped functioning.
Issue 1 :
Not able to create new notebook or update existing one.
Issue 2:
Not able to modify interpreter settings. Save action never gets completed on UI.
Issue 3:
Not able to run paragraphs.
Seeing below error in zeppelin logs :
WARN [2017-12-19 13:18:48,128] ({qtp1076835071-86681} Client.java[run]:715) - Exception encountered while connecting to the server : javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)] INFO [2017-12-19 13:18:48,128] ({qtp1076835071-86681} RetryInvocationHandler.java[log]:280) - java.io.IOException: Failed on local exception: java.io.IOException: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]; Host Details : local host is: "ctr-e136-1513029738776-12293-01-000004.hwx.site/172.27.22.148"; destination host is: "ctr-e136-1513029738776-12293-01-000004.hwx.site":8020; , while invoking ClientNamenodeProtocolTranslatorPB.create over ctr-e136-1513029738776-12293-01-000004.hwx.site/172.27.22.148:8020 after 12 failover attempts. Trying to failover after sleeping for 15905ms.