[SPARK-20435] More thorough redaction of sensitive information from logs/UI, more unit tests - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 2.2.0
Fix Version/s: 2.2.0
Component/s: Spark Core
Labels:
None

Description

~~SPARK-18535~~ and ~~SPARK-19720~~ were works to redact sensitive information (e.g. hadoop credential provider password, AWS access/secret keys) from event logs + YARN logs + UI and from the console output, respectively.

While some unit tests were added along with these changes - they asserted when a sensitive key was found, that redaction took place for that key. They didn't assert globally that when running a full-fledged Spark app (whether or YARN or locally), that sensitive information was not present in any of the logs or UI. Such a test would also prevent regressions from happening in the future if someone unknowingly adds extra logging that publishes out sensitive information to disk or UI.

Consequently, it was found that in some Java configurations, sensitive information was still being leaked in the event logs under the SparkListenerEnvironmentUpdate event, like so:

"sun.java.command":"org.apache.spark.deploy.SparkSubmit ... --conf spark.executorEnv.HADOOP_CREDSTORE_PASSWORD=secret_password ...

"secret_password" should have been redacted.

Moreover, previously redaction logic was only checking if the key matched the secret regex pattern, it'd redact it's value. That worked for most cases. However, in the above case, the key (sun.java.command) doesn't tell much, so the value needs to be searched. So the check needs to be expanded to match against values as well.

Attachments

Issue Links

links to

[Github] Pull Request #17725 (markgrover)

Activity

People

Assignee:: Mark Grover

Reporter:: Mark Grover

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 22/Apr/17 00:34

Updated:: 27/Apr/17 05:16

Resolved:: 27/Apr/17 00:06