Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-7542

Support off-heap sort buffer in UnsafeExternalSorter

    Details

    • Type: New Feature
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 1.4.0
    • Fix Version/s: 1.6.0
    • Component/s: Spark Core
    • Labels:
      None

      Description

      UnsafeExternalSorter, introduced in SPARK-7081, uses on-heap long[] arrays as its sort buffers. When records are small, the sorting array might be as large as the data pages, so it would be useful to be able to allocate this array off-heap (using our unsafe LongArray). Unfortunately, we can't currently do this because TimSort calls allocate() to create data buffers but doesn't call any corresponding cleanup methods to free them.

      We should look into extending TimSort with buffer freeing methods, then consider switching to LongArray in UnsafeShuffleSortDataFormat.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                davies Davies Liu
                Reporter:
                joshrosen Josh Rosen
              • Votes:
                0 Vote for this issue
                Watchers:
                7 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: