Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-12778

Use of Java Unsafe should take endianness into account

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Incomplete
    • None
    • None
    • Input/Output

    Description

      In Platform.java, methods of Java Unsafe are called directly without considering endianness.

      In thread, 'Tungsten in a mixed endian environment', Adam Roberts reported data corruption when "spark.sql.tungsten.enabled" is enabled in mixed endian environment.

      Platform.java should take endianness into account.

      Below is a copy of Adam's report:

      I've been experimenting with DataFrame operations in a mixed endian environment - a big endian master with little endian workers. With tungsten enabled I'm encountering data corruption issues.

      For example, with this simple test code:

      import org.apache.spark.SparkContext
      import org.apache.spark._
      import org.apache.spark.sql.SQLContext
      
      object SimpleSQL {
        def main(args: Array[String]): Unit = {
          if (args.length != 1) {
            println("Not enough args, you need to specify the master url")
          }
          val masterURL = args(0)
          println("Setting up Spark context at: " + masterURL)
          val sparkConf = new SparkConf
          val sc = new SparkContext(masterURL, "Unsafe endian test", sparkConf)
      
          println("Performing SQL tests")
      
          val sqlContext = new SQLContext(sc)
          println("SQL context set up")
          val df = sqlContext.read.json("/tmp/people.json")
          df.show()
          println("Selecting everyone's age and adding one to it")
          df.select(df("name"), df("age") + 1).show()
          println("Showing all people over the age of 21")
          df.filter(df("age") > 21).show()
          println("Counting people by age")
          df.groupBy("age").count().show()
        }
      } 
      

      Instead of getting

      +----+-----+
      | age|count|
      +----+-----+
      |null|    1|
      |  19|    1|
      |  30|    1|
      +----+-----+ 
      

      I get the following with my mixed endian set up:

      +-------------------+-----------------+
      |                age|            count|
      +-------------------+-----------------+
      |               null|                1|
      |1369094286720630784|72057594037927936|
      |                 30|                1|
      +-------------------+-----------------+ 
      

      and on another run:

      +-------------------+-----------------+
      |                age|            count|
      +-------------------+-----------------+
      |                  0|72057594037927936|
      |                 19|                1| 
      

      Attachments

        Activity

          People

            Unassigned Unassigned
            yuzhihong@gmail.com Ted Yu
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: