Details
-
Improvement
-
Status: Closed
-
Major
-
Resolution: Won't Fix
-
0.7.0-incubating
-
None
Description
I suggest to rework the memory management.
Right now, it works the following way: When the program is submitted, it is checked how many memory consuming operations happen (sort, hash, explicit cache, ... ) Each one is assigned a static relative memory fraction, which the taskmanager provides.
This is a very conservative planning and mostly due to the fact that with the streaming runtime, we may have all operations running concurrently. But in fact we mostly have not and are therefore wasting memory by being too conservative.
To make the most of the available memory, I suggest to make the management adaptive:
- Operators need to be able to request memory bit by bit
- Operators need to be able to release memory on request. The sorter / hash table / cache do this naturally by spilling.
- Memory has to be redistributed between operators when new requesters come.
This also plays nicely with the idea of leaving all non-assigned memory to intermediate results, to allow for maximum caching of historic intermediate results.