Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Duplicate
-
1.6.2
-
None
-
None
Description
Say a user wants to run a Spark job in YARN on a secure Hadoop cluster, and that job requires access to some service for which the Spark framework currently does not support getting delegation tokens. If the job runs in client mode, the user can simply fetch a delegation token for that service in their own code. If, on the other hand, the job runs in cluster mode, there does not appear to be any way for the user to get that delegation token and propagate it to the driver (apart from modifying Spark's own code to provide support for the service in question).
It would be helpful if there were a configuration property specifying the list of supported services (perhaps defaulting to "hbase,hive"). For each service in this list, there could be two additional configuration properties; one of these would be used to disable or enable fetching of tokens, like the existing spark.yarn.security.tokens.service.enabled properties, while the other would be used to specify a class that would implement the actual task of fetching a delegation token for the service. Where Spark is currently fetching tokens for HBase and the Hive metastore, it would instead loop over the list of configured services and use the specified classes to get tokens for each of them in turn.
With this change, if a user wanted to add support for a new service, they would not have to modify Spark; instead, they would just have to write a class implementing the task of obtaining a token for that service, include that class on spark-submit's classpath, and change the relevant configuration properties to tell Spark to fetch tokens for that service using that class.
Attachments
Issue Links
- duplicates
-
SPARK-14743 Improve delegation token handling in secure clusters
- Resolved