Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-12751

Create file based HA support

    XMLWordPrintableJSON

Details

    Description

      In the current Flink implementation, HA support can be implemented either using Zookeeper or Custom Factory class.
      Add HA implementation based on PVC. The idea behind this implementation
      is as follows:

      • Because implementation assumes a single instance of Job manager (Job manager selection and restarts are done by K8 Deployment of 1)
        URL management is done using StandaloneHaServices implementation (in the case of cluster) and EmbeddedHaServices implementation (in the case of mini cluster)
      • For management of the submitted Job Graphs, checkpoint counter and completed checkpoint an implementation is leveraging the following file system layout
         ha -----> root of the HA data
         checkpointcounter -----> checkpoint counter folder
         <job ID> -----> job id folder
         <counter file> -----> counter file
         <another job ID> -----> another job id folder
         ...........
         completedCheckpoint -----> completed checkpoint folder
         <job ID> -----> job id folder
         <checkpoint file> -----> checkpoint file
         <another checkpoint file> -----> checkpoint file
         ...........
         <another job ID> -----> another job id folder
         ...........
         submittedJobGraph -----> submitted graph folder
         <job ID> -----> job id folder
         <graph file> -----> graph file
         <another job ID> -----> another job id folder
         ...........
        

      An implementation should overwrites 2 of the Flink files:

      • HighAvailabilityServicesUtils - added `FILESYSTEM` option for picking HA service
      • HighAvailabilityMode - added `FILESYSTEM` to available HA options.

      The actual implementation adds the following classes:

      • `FileSystemHAServices` - an implementation of a `HighAvailabilityServices` for file system
      • `FileSystemUtils` - support class for creation of runtime components.
      • `FileSystemStorageHelper` - file system operations implementation for filesystem based HA
      • `FileSystemCheckpointRecoveryFactory` - an implementation of a `CheckpointRecoveryFactory`for file system
      • `FileSystemCheckpointIDCounter` - an implementation of a `CheckpointIDCounter` for file system
      • `FileSystemCompletedCheckpointStore` - an implementation of a `CompletedCheckpointStore` for file system
      • `FileSystemSubmittedJobGraphStore` - an implementation of a `SubmittedJobGraphStore` for file system

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              borisl Boris Lublinsky
              Votes:
              2 Vote for this issue
              Watchers:
              12 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - 168h
                  168h
                  Remaining:
                  Remaining Estimate - 167h 50m
                  167h 50m
                  Logged:
                  Remaining Estimate - 167h 50m
                  10m