Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-12692

Support disk spilling in HeapKeyedStateBackend

Attach filesAttach ScreenshotAdd voteVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    Description

      HeapKeyedStateBackend is one of the two KeyedStateBackends in Flink, since state lives as Java objects on the heap and the de/serialization only happens during state snapshot and restore, it outperforms RocksDBKeyedStateBackend when all data could reside in memory.

      However, along with the advantage, HeapKeyedStateBackend also has its shortcomings, and the most painful one is the difficulty to estimate the maximum heap size (Xmx) to set, and we will suffer from GC impact once the heap memory is not enough to hold all state data. There’re several (inevitable) causes for such scenario, including (but not limited to):

      • Memory overhead of Java object representation (tens of times of the serialized data size).
      • Data flood caused by burst traffic.
      • Data accumulation caused by source malfunction.

      To resolve this problem, we propose a solution to support spilling state data to disk before heap memory is exhausted. We will monitor the heap usage and choose the coldest data to spill, and reload them when heap memory is regained after data removing or TTL expiration, automatically.

      More details please refer to the design doc and mailing list discussion.

      Attachments

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            Unassigned Unassigned
            liyu Yu Li

            Dates

              Created:
              Updated:

              Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0h
                0h
                Logged:
                Time Spent - 1h 10m
                1h 10m

                Slack

                  Issue deployment