Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
2.4.7, 3.0.1
-
None
Description
Consider the following sequence of events:
- UnsafeExternalSorter runs out of space in its pointer array and attempts to allocate a large array to replace the current one.
- TaskMemoryManager tries to allocate the memory backing the large array using MemoryManager, but MemoryManager is only willing to return most but not all of the memory requested.
- TaskMemoryManager asksĀ UnsafeExternalSorter to spill, which causesĀ UnsafeExternalSorter to spill the current run to disk, to free its record pages and to reset its UnsafeInMemorySorter.
- UnsafeInMemorySorter frees its pointer array, and tries to allocate a new small pointer array.
- TaskMemoryManager tries to allocate the memory backing the small array using MemoryManager, but MemoryManager is unwilling to give it any memory, as the TaskMemoryManager is still holding on to the memory it got for the large array.
- TaskMemoryManager again asks UnsafeExternalSorter to spill, but this time there is nothing to spill.
- UnsafeInMemorySorter receives less memory than it requested, and causes a SparkOutOfMemoryError to be thrown, which causes the current task to fail.
A simple way to fix this is to avoid allocating a new array in UnsafeInMemorySorter.reset() and to do this on-demand instead.