Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-12884 FLIP-144: Native Kubernetes HA Service
  3. FLINK-19544

Implement CheckpointRecoveryFactory based on Kubernetes API

    XMLWordPrintableJSON

Details

    Description

      • CheckpointRecoveryFactory
      • Stores meta information to Zookeeper/ConfigMap for checkpoint recovery.
      • Stores the latest checkpoint counter.

      Each component(Dispatcher, ResourceManager, JobManager, RestEndpoint) will have a dedicated ConfigMap. All the HA information relevant for a specific component will be stored in a single ConfigMap. The JobManager's ConfigMap would then contain the current leader, the pointers to the checkpoints and the checkpoint ID counter. Since “Get(check the leader)-and-Update(write back to the ConfigMap)” is a transactional operation, we will completely solved the concurrent modification issues and not using the "lock-and-release" in Zookeeper.

      Attachments

        Activity

          People

            wangyang0918 Yang Wang
            wangyang0918 Yang Wang
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: