Details
-
Improvement
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
-
None
Description
We have seen the in-memory store buffered too much data and eventually get OOM. Generally speaking, OOM has no real indication of the underlying problem and increases the difficulty for user debugging, since the failed thread may not be the actual culprit which causes the explosion. If we could get better protection to avoid hitting memory limit, or at least giving out a clear guide, the end user debugging would be much simpler.
To make it work, we need to enforce a certain memory limit below heap size and take actions when hitting it. The first question would be, whether we set a numeric limit, such as 100MB or 500MB, or a percentile limit, such as 60% or 80% of total memory.
The second question is about the action itself. One approach would be crashing the store immediately and inform the user to increase their application capacity. The second approach would be opening up an on-disk store spontaneously and offload the data to it.
Personally I'm in favor of approach #2 because it has minimum impact to the on-going application. However it is more complex and potentially requires significant works to define the proper behavior such as the default store configuration, how to manage its lifecycle, etc.