Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-10287

Flink HA Persist Cancelled Job in Zookeeper

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 1.6.0
    • Fix Version/s: None
    • Component/s: Runtime / Coordination
    • Labels:
      None

      Description

      Flink HA persisted canceled job in Zookeeper, which makes HA mode quite fragile. In case JM get restarted, it tries to recover canceled job and after some time fails completely being not able to recover it. 

       

      How to reproduce:

      1. Have Flink HA 1.6 cluster
      2. Cancel a running flink job
      3. Observe that flink didn't remove ZK metadata.

      ls /flink/flink_ns/jobgraphs/46d8d3555936c0d8e6b6ec21cc02bb11
      [7f392fd9-cedc-4978-9186-1f54b98eeeb7]

        Attachments

        1. Screenshot from 2018-09-05 16-48-34.png
          21 kB
          Sayat Satybaldiyev

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              sayatez Sayat Satybaldiyev
            • Votes:
              1 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated: