The code of Partition::Spill() for both PHJ and PAGG does not release memory as soon as it can increasing the changes not to be able to SwitchToIoBuffers the probe_rows_ or the unaggregated_rows_ correspondingly. That is, the code of PHJ::PartitionSpill() is (PAGG's is similar):
It looks like we can further reduce the memory needed when spilling a partition by
(a) Destruct the hash table
(b) SwitchToIoBuffers() the build_rows_ or the aggregated_rows_
(c) if successful or already switched, UnpinStream() for that stream so that we free up a potentially large number of buffers
(d) SwitchToIoBuffers() the probe_rows_ or the unaggregated_rows_
The reason we have not noticed this problem it was because up until now we would switch all streams of all partitions to IO-sized buffers the first time the small buffers of any stream would fill up. That is, the small buffers patch exposes this problem as well (with that patch a couple of test_spilling tests needed a lot of more memory to complete successfully).