When we wish to sort within partitions, we produce an UnsafeExternalRowSorter. This contains an UnsafeExternalSorter, which contains the UnsafeExternalRowComparator.
The UnsafeExternalSorter adds a task completion listener which performs any additional required cleanup. The upshot of this is that we maintain a reference to the UnsafeExternalRowSorter.RowComparator until the end of the task.
The RowComparator looks like
which means that this will contain references to the last baseObjs that were passed in, and without tracking them for purposes of memory allocation.
We have a job which sorts within partitions and then coalesces partitions - this has a tendency to OOM because of the references to old UnsafeRows that were used during the sorting.
Attached is a screenshot of a memory dump during a task - our JVM has two executor threads.
It can be seen that we have 2 references inside of row iterators, and 11 more which are only known in the task completion listener or as part of memory management.