The new unified memory management model in
SPARK-10983 uncovered many brittle tests that rely on arbitrary thresholds to detect spilling. Some tests don't even assert that spilling did occur.
We should go through all the places where we test spilling behavior and correct the tests, a subset of which are definitely incorrect. Potential suspects: