Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-25081

Nested spill in ShuffleExternalSorter may access a released memory page

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Blocker
    • Resolution: Fixed
    • Affects Version/s: 1.6.0, 1.6.1, 1.6.2, 1.6.3, 2.0.0, 2.0.1, 2.0.2, 2.1.0, 2.1.1, 2.1.2, 2.1.3, 2.2.0, 2.2.1, 2.2.2, 2.3.0, 2.3.1, 2.3.2
    • Fix Version/s: 2.2.3, 2.3.3, 2.4.0
    • Component/s: Spark Core
    • Labels:

      Description

      This issue is pretty similar to SPARK-21907.
      "allocateArray" in ShuffleInMemorySorter.reset may trigger a spill and cause ShuffleInMemorySorter access the released `array`. Another task may get the same memory page from the pool. This will cause two tasks access the same memory page. When a task reads memory written by another task, many types of failures may happen. Here are some examples I have seen:

      • JVM crash. (This is easy to reproduce in a unit test as we fill newly allocated and deallocated memory with 0xa5 and 0x5a bytes which usually points to an invalid memory address)
      • java.lang.IllegalArgumentException: Comparison method violates its general contract!
      • java.lang.NullPointerException at org.apache.spark.memory.TaskMemoryManager.getPage(TaskMemoryManager.java:384)
      • java.lang.UnsupportedOperationException: Cannot grow BufferHolder by size -536870912 because the size after growing exceeds size limitation 2147483632

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                zsxwing Shixiong Zhu
                Reporter:
                zsxwing Shixiong Zhu
              • Votes:
                0 Vote for this issue
                Watchers:
                7 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: