Description
Currently JDBC data source credentials are not masked in the explain output. This can lead to accidental leakage of credentials into logs, and UI
SPARK -11206 added support for showing the SQL plan details in the History server. After this change query plans are also written to the event logs in the disk when event log is enabled, in this case credential will leak into the event logs that can be accessed by file systems admins.
Repro :
val empdf = sqlContext.read.jdbc("jdbc:postgresql://localhost:5432/mydb", "spark_emp", psqlProps) empdf.explain(true)
Plan output with credentials :
== Parsed Logical Plan == +details == Parsed Logical Plan == Limit 21 +- Relation[id#4,name#5] JDBCRelation(jdbc:postgresql://localhost:5432/mydb,spark_emp,[Lorg.apache.spark.Partition;@3ff74546,{user=dbuser, password=pwdata}) == Analyzed Logical Plan == id: int, name: string Limit 21 +- Relation[id#4,name#5] JDBCRelation(jdbc:postgresql://localhost:5432/mydb,spark_emp,[Lorg.apache.spark.Partition;@3ff74546,{user=dbuser, password=pwdata}) == Optimized Logical Plan == Limit 21 +- Relation[id#4,name#5] JDBCRelation(jdbc:postgresql://localhost:5432/mydb,spark_emp,[Lorg.apache.spark.Partition;@3ff74546,{user=dbuser, password=pwdata}) == Physical Plan == Limit 21 +- Scan JDBCRelation(jdbc:postgresql://localhost:5432/mydb,spark_emp,[Lorg.apache.spark.Partition;@3ff74546,{user=dbuser, password=pwdata}) PushedFilter: [] [id#4,name#5]