Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-4266

Cassandra SplitOver Statebackend

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Reopened
    • Not a Priority
    • Resolution: Unresolved
    • 1.3.0
    • None

    Description

      Current FileSystem statebackend limits whole state size to disk space. Dealing with scale out checkpoint states beyond local disk space such as long running task that hold window content for long period of time. Pipelines needs to split out states to durable remote storage even replicated to different data centers.

      We draft a design that leverage checkpoint id as mono incremental logic timestamp and perform range query to get evicited state k/v. we also introduce checkpoint time commit and eviction threshold that reduce "hot states" hitting remote db per every update between adjacent checkpoints by tracking update logs and merge them, do batch updates only when checkpoint; lastly, we are looking for eviction policy that can identify "hot keys" in k/v state and lazy load those "cold keys" from remote storage(e.g Cassandra).

      For now, we don't have good story regarding to data retirement. We might have to keep forever until manually run command and clean per job related state data. Some of features might related to incremental checkpointing feature, we hope to align with effort there also.

      Welcome comments, I will try to put a draft design doc after gathering some feedback.

      Attachments

        Activity

          People

            foxss Chen Qin
            foxss Chen Qin
            Votes:
            3 Vote for this issue
            Watchers:
            19 Start watching this issue

            Dates

              Created:
              Updated: