Details
-
Bug
-
Status: Resolved
-
Blocker
-
Resolution: Fixed
-
2.0.0
Description
The following failing test demonstrates a bug where Spark mis-encodes array-of-struct fields if whole-stage codegen is disabled:
withSQLConf(SQLConf.WHOLESTAGE_CODEGEN_ENABLED.key -> "false") { val data = Array(Array((1, 2), (3, 4))) val ds = spark.sparkContext.parallelize(data).toDS() assert(ds.collect() === data) }
When wholestage codegen is enabled (the default), this works fine. When it's disabled, as in the test above, Spark returns Array(Array((3,4), (3,4))). Because the last element of the array appears to be repeated my best guess is that the interpreted evaluation codepath forgot to copy() somewhere.
Attachments
Attachments
Issue Links
- relates to
-
SPARK-17061 Incorrect results returned following a join of two datasets and a map step where total number of columns >100
- Resolved
- links to