Details
-
Sub-task
-
Status: Resolved
-
Minor
-
Resolution: Incomplete
-
1.6.0
-
None
-
Problems apparent on BE, LE could be impacted too
-
Important
Description
JIRA to cover endian specific problems - since testing 1.6 I've noticed problems with DataFrames on BE platforms, e.g. https://issues.apache.org/jira/browse/SPARK-9858
Current progress: using com.google.common.io.LittleEndianDataInputStream and com.google.common.io.LittleEndianDataOutputStream within UnsafeRowSerializer fixes three test failures in ExchangeCoordinatorSuite but I'm concerned around performance/wider functional implications
"org.apache.spark.sql.DatasetAggregatorSuite.typed aggregation: class input with reordering" fails as we expect "one, 1" but instead get "one, 9" - we believe the issue lies within BitSetMethods.java, specifically around: return (wi << 6) + subIndex + java.lang.Long.numberOfTrailingZeros(word);