Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
2.2.0, 2.3.0
-
None
Description
In the SparkSubmit code, we call resolveGlobPaths, which eventually calls getFileStatus, which for HDFS is an RPC call to the NameNode: https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala#L346.
We do this before we call loginUserFromKeytab, which is further down in the same method: https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala#L655.
The result is that the call to resolveGlobPaths fails in secure clusters with:
javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]
A workaround is to kinit on the host before using spark-submit. However, it's better if this workaround isn't necessary. A simple fix is to call loginUserFromKeytab before attempting to interact with HDFS.
At least for cluster mode, this would appear to be a regression caused by SPARK-21012.