[SPARK-12778] Use of Java Unsafe should take endianness into account - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Incomplete
Affects Version/s: None
Fix Version/s: None
Component/s: Input/Output
Labels:
- bulk-closed

Description

In Platform.java, methods of Java Unsafe are called directly without considering endianness.

In thread, 'Tungsten in a mixed endian environment', Adam Roberts reported data corruption when "spark.sql.tungsten.enabled" is enabled in mixed endian environment.

Platform.java should take endianness into account.

Below is a copy of Adam's report:

I've been experimenting with DataFrame operations in a mixed endian environment - a big endian master with little endian workers. With tungsten enabled I'm encountering data corruption issues.

For example, with this simple test code:

import org.apache.spark.SparkContext
import org.apache.spark._
import org.apache.spark.sql.SQLContext

object SimpleSQL {
  def main(args: Array[String]): Unit = {
    if (args.length != 1) {
      println("Not enough args, you need to specify the master url")
    }
    val masterURL = args(0)
    println("Setting up Spark context at: " + masterURL)
    val sparkConf = new SparkConf
    val sc = new SparkContext(masterURL, "Unsafe endian test", sparkConf)

    println("Performing SQL tests")

    val sqlContext = new SQLContext(sc)
    println("SQL context set up")
    val df = sqlContext.read.json("/tmp/people.json")
    df.show()
    println("Selecting everyone's age and adding one to it")
    df.select(df("name"), df("age") + 1).show()
    println("Showing all people over the age of 21")
    df.filter(df("age") > 21).show()
    println("Counting people by age")
    df.groupBy("age").count().show()
  }
}

Instead of getting

+----+-----+
| age|count|
+----+-----+
|null|    1|
|  19|    1|
|  30|    1|
+----+-----+

I get the following with my mixed endian set up:

+-------------------+-----------------+
|                age|            count|
+-------------------+-----------------+
|               null|                1|
|1369094286720630784|72057594037927936|
|                 30|                1|
+-------------------+-----------------+

and on another run:

+-------------------+-----------------+
|                age|            count|
+-------------------+-----------------+
|                  0|72057594037927936|
|                 19|                1|

Attachments

Issue Links

links to

[Github] Pull Request #10725 (tedyu)

Activity

People

Assignee:: Unassigned

Reporter:: Ted Yu

Votes:: 0 Vote for this issue

Watchers:: 6 Start watching this issue

Dates

Created:: 12/Jan/16 15:28

Updated:: 21/May/19 04:34

Resolved:: 21/May/19 04:34