Uploaded image for project: 'Kafka'
  1. Kafka
  2. KAFKA-10237

Properly handle in-memory stores OOM

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • streams
    • None

    Description

      We have seen the in-memory store buffered too much data and eventually get OOM. Generally speaking, OOM has no real indication of the underlying problem and increases the difficulty for user debugging, since the failed thread may not be the actual culprit which causes the explosion. If we could get better protection to avoid hitting memory limit, or at least giving out a clear guide, the end user debugging would be much simpler. 

      To make it work, we need to enforce a certain memory limit below heap size and take actions  when hitting it. The first question would be, whether we set a numeric limit, such as 100MB or 500MB, or a percentile limit, such as 60% or 80% of total memory.

      The second question is about the action itself. One approach would be crashing the store immediately and inform the user to increase their application capacity. The second approach would be opening up an on-disk store spontaneously and offload the data to it.

      Personally I'm in favor of approach #2 because it has minimum impact to the on-going application. However it is more complex and potentially requires significant works to define the proper behavior such as the default store configuration, how to manage its lifecycle, etc.

       

      Attachments

        Activity

          People

            Unassigned Unassigned
            bchen225242 Boyang Chen
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated: