Details
-
Sub-task
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
1.9.0
-
None
-
None
Description
The External Sort operator (batch) can spill to disk when it runs out of memory. To do so, it uses a class called BatchGroup. Later, when the sort merges spilled runs, BatchGroup reads the run back into memory one batch at a time.
If an OOM error occurs during the read operation, the partially-read batches leak: they are not released. The fragment executor then issues a memory leak error while shutting down the query.
This error has probably not been caught until now because the BatchGroup code does not make use of the fault injector. Elsewhere in the external sort, we use the fault injector to insert a (simulated) OOM exception so we can determine if clean-up occurs properly. No such fault injection code exists in BatchGroup.