Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-21033

fix the potential OOM in UnsafeExternalSorter

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.2.0
    • 2.3.0
    • SQL
    • None

    Description

      In `UnsafeInMemorySorter`, one record may take 32 bytes: 1 `long` for pointer, 1 `long` for key-prefix, and another 2 `long`s as the temporary buffer for radix sort.

      In `UnsafeExternalSorter`, we set the `DEFAULT_NUM_ELEMENTS_FOR_SPILL_THRESHOLD` to be `1024 * 1024 * 1024 / 2`, and hoping the max size of point array to be 8 GB. However this is wrong, `1024 * 1024 * 1024 / 2 * 32` is actually 16 GB, and if we grow the point array before reach this limitation, we may hit the max-page-size error.

      Users may see exception like this on large dataset:

      Caused by: java.lang.IllegalArgumentException: Cannot allocate a page with more than 17179869176 bytes
      at org.apache.spark.memory.TaskMemoryManager.allocatePage(TaskMemoryManager.java:241)
      at org.apache.spark.memory.MemoryConsumer.allocatePage(MemoryConsumer.java:121)
      at org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.acquireNewPageIfNecessary(UnsafeExternalSorter.java:374)
      at org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.insertRecord(UnsafeExternalSorter.java:396)
      at org.apache.spark.sql.execution.UnsafeExternalRowSorter.insertRow(UnsafeExternalRowSorter.java:94)
      ...
      

      Attachments

        Issue Links

          Activity

            People

              cloud_fan Wenchen Fan
              cloud_fan Wenchen Fan
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: