Based on the fact that now we spill big bags first, my observation is that there are still cases where a big container bag is spilled and therefore its mContent becomes empty but most of its inner bags' WeakReferences aren't clean-up by GC yet. In such cases, if we haven't freed up enough memory, those inner bags will be unnecessarily spilled (however all their contents were already spilled in the big bag spill). Possibly that are 2 simple ways to solve this:-
1) In SpillableMemoryManager, we try putting Thread.yield() in between each spill. This should allow some more time for GC to do more clean-up without degrading performance too much. However, if the main execution thread doesn't produce any bag (e.g. a map task where all keys and values are tuples and atomic data), this will give more time to the main execution thread to use up more memory more quickly.
2) Check the size of the current spillable being spilled. If it is larger than constant X, do a System.GC(). This is safer than (1) but due to the fact that we explicitly call GC more often, it may have some impact on performance. However, by considering the fact that spilling small files is much slower than doing System.GC(), this approach should then generally give a better performance.
I don't really have a processing task that incurs spilling that much. Can anyone please try (2) out?