Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-23300

Job fails very slow because of no notifyAllocationFailure for DeclarativeSlotManager

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Duplicate
    • 1.13.1
    • None
    • Runtime / Task
    • None

    Description

      When container is killed, flink on yarn can detect the problem very quickly. But when using default DeclarativeSlotManager, notifyAllocationFailure is not called and the task is not failed until heartbeat is timeout. So the failover will be very slow. 

      Attachments

        Activity

          People

            Unassigned Unassigned
            Jiangang Liu
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: