The loop in GetRowsFromPartition() invokes Finalize()/Serialize() on all of the rows in output_partition_, which may allocate local allocations in output_partition_->agg_fn_evals. These are not cleared out until the partition is destroyed.
If the having conjuncts are very select this can result in a lot of excess memory. E.g. the following query results in a lot of non-buffer-pool memory overhead: