Description
Severity: Important
Vendor: The Apache Software Foundation
Versions affected:
All Spark 1.x, Spark 2.0.x, Spark 2.1.x, and 2.2.x versions
Spark 2.3.0 to 2.3.2
Description:
Prior to Spark 2.3.3, in certain situations Spark would write user data to local disk unencrypted, even if spark.io.encryption.enabled=true. This includes cached blocks that are fetched to disk (controlled by spark.maxRemoteBlockSizeFetchToMem); in SparkR, using parallelize; in Pyspark, using broadcast and parallelize; and use of python udfs.
Mitigation:
1.x, 2.0.x, 2.1.x, 2.2.x, 2.3.x users should upgrade to 2.3.3 or newer, including 2.4.x
Credit:
This issue was reported by Thomas Graves of NVIDIA.
References:
https://spark.apache.org/security.html
The following commits were used to fix this issue, in branch-2.3 (there may be other commits in master / branch-2.4, that are equivalent.)
commit 575fea120e25249716e3f680396580c5f9e26b5b Author: Imran Rashid <irashid@cloudera.com> Date: Wed Aug 22 16:38:28 2018 -0500 [CORE] Updates to remote cache reads Covered by tests in DistributedSuite commit 6d742d1bd71aa3803dce91a830b37284cb18cf70 Author: Imran Rashid <irashid@cloudera.com> Date: Thu Sep 6 12:11:47 2018 -0500 [PYSPARK][SQL] Updates to RowQueue Tested with updates to RowQueueSuite commit 09dd34cb1706f2477a89174d6a1a0f17ed5b0a65 Author: Imran Rashid <irashid@cloudera.com> Date: Mon Aug 13 21:35:34 2018 -0500 [PYSPARK] Updates to pyspark broadcast commit 12717ba0edfa5459c9ac2085f46b1ecc0ee759aa Author: hyukjinkwon <gurwls223@apache.org> Date: Mon Sep 24 19:25:02 2018 +0800 [SPARKR] Match pyspark features in SparkR communication protocol