Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
2.0.0
-
None
Description
In a way, I'd consider this a parent bug of SPARK-7252.
Spark's current support for delegation tokens is a little all over the place:
- for HDFS, there's support for re-creating tokens if a principal and keytab are provided
- for HBase and Hive, Spark will fetch delegation tokens so that apps can work in cluster mode, but will not re-create them, so apps that need those will stop working after 7 days
- for anything else, Spark doesn't do anything. Lots of other services use delegation tokens, and supporting them as data sources in Spark becomes more complicated because of that. e.g., Kafka will (hopefully) soon support them.
It would be nice if Spark had consistent support for handling delegation tokens regardless of who needs them. I'd list these as the requirements:
- Spark to provide a generic interface for fetching delegation tokens. This would allow Spark's delegation token support to be extended using some plugin architecture (e.g. Java services), meaning Spark itself doesn't need to support every possible service out there.
This would be used to fetch tokens when launching apps in cluster mode, and when a principal and a keytab are provided to Spark.
- A way to manually update delegation tokens in Spark. For example, a new SparkContext API, or some configuration that tells Spark to monitor a file for changes and load tokens from said file.
This would allow external applications to manage tokens outside of Spark and be able to update a running Spark application (think, for example, a job sever like Oozie, or something like Hive-on-Spark which manages Spark apps running remotely).
- A way to notify running code that new delegation tokens have been loaded.
This may not be strictly necessary; it might be possible for code to detect that, e.g., by peeking into the UserGroupInformation structure. But an event sent to the listener bus would allow applications to react when new tokens are available (e.g., the Hive backend could re-create connections to the metastore server using the new tokens).
Also, cc'ing busbey and steve_l since you've talked about this in the mailing list recently.
Attachments
Issue Links
- incorporates
-
SPARK-19143 API in Spark for distributing new delegation tokens (Improve delegation token handling in secure clusters)
- Resolved
- is depended upon by
-
SPARK-16871 Support getting HBase tokens from multiple clusters dynamically
- Resolved
- is duplicated by
-
SPARK-16342 Add a new Configurable Token Manager for Spark Running on YARN
- Closed
-
SPARK-16612 Introduce a way for users to easily add support for new services that need delegation tokens
- Resolved
- is related to
-
ZEPPELIN-1730 impersonate spark interpreter using --proxy-user
- Resolved
-
SPARK-25689 Move token renewal logic to driver in yarn-client mode
- Resolved
- relates to
-
HBASE-15570 renewable delegation tokens for long-lived spark applications
- Resolved
- links to