Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-27222

Execution history limit can lead to eviction of critical local-recovery information

    XMLWordPrintableJSON

Details

    Description

      Local recovery relies on knowing the allocation id of the last deployment. To that end we iterate over all previous execution attempts and use the last assignedAllocationID, if any.
      However, since the execution history is bounded (to, by default, 16 entries) this can lead this information being evicted.

      In other words, with the default configuration (history limit = 16, restart delay = 1s) local recovery can only kick if the TM is restarted within 16 seconds.

      We should decouple this information from the execution (history).

      Attachments

        Issue Links

          Activity

            People

              chesnay Chesnay Schepler
              chesnay Chesnay Schepler
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: