Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-10877

Assertions fail straightforward DataFrame job due to word alignment

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 1.5.0
    • None
    • SQL
    • None

    Description

      I have some code that I’m running in a unit test suite, but the code I’m running is failing with an assertion error.

      I have translated the JUnit test that was failing, to a Scala script that I will attach to the ticket. The assertion error is the following:

      Exception in thread "main" org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 1 times, most recent failure: Lost task 0.0 in stage 0.0 (TID 0, localhost): java.lang.AssertionError: lengthInBytes must be a multiple of 8 (word-aligned)
      at org.apache.spark.unsafe.hash.Murmur3_x86_32.hashUnsafeWords(Murmur3_x86_32.java:53)
      at org.apache.spark.sql.catalyst.expressions.UnsafeArrayData.hashCode(UnsafeArrayData.java:289)
      at org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow$class.hashCode(rows.scala:149)
      at org.apache.spark.sql.catalyst.expressions.GenericMutableRow.hashCode(rows.scala:247)
      at org.apache.spark.HashPartitioner.getPartition(Partitioner.scala:85)
      at org.apache.spark.sql.execution.Exchange$$anonfun$doExecute$1$$anonfun$4$$anonfun$apply$4.apply(Exchange.scala:180)
      at org.apache.spark.sql.execution.Exchange$$anonfun$doExecute$1$$anonfun$4$$anonfun$apply$4.apply(Exchange.scala:180)
      at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
      

      However, it turns out that this code actually works normally and computes the correct result if assertions are turned off.

      I traced the code and found that when hashUnsafeWords was called, it was given a byte-length of 12, which clearly is not a multiple of 8. However, the job seems to compute correctly regardless of this fact. Of course, I can’t just disable assertions for my unit test though.

      A few things we need to understand:

      1. Why is the lengthInBytes of size 12?
      2. Is it actually a problem that the byte length is not word-aligned? If so, how should we fix the byte length? If it's not a problem, why is the assertion flagging a false negative?

      Attachments

        1. SparkFilterByKeyTest.scala
          3 kB
          Matt Cheah

        Activity

          People

            davies Davies Liu
            mcheah Matt Cheah
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: