Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-12472

Support setting attemptFailuresValidityInterval of jobs on Yarn

    XMLWordPrintableJSON

Details

    • Yarn's attempt failure validity interval no longer defaults to the ask timeout `akka.ask.timeout`. Instead it can now be configured independently via `yarn.application-attempt-failures-validity-interval` whose default value is `10000` milliseconds.

    Description

      According to the documentation of Yarn, a yarn application can set a attemptFailuresValidityInterval  to reset application attempts.

       

      "attemptFailuresValidityInterval. The default value is -1. when attemptFailuresValidityInterval in milliseconds is set to > 0, the failure number will no take failures which happen out of the validityInterval into failure count. If failure count reaches to maxAppAttempts, the application will be failed."

       

      We can make use of this feature to make Flink jobs on Yarn to be more long-running.

      Attachments

        Issue Links

          Activity

            People

              victor-wong jiasheng55
              victor-wong jiasheng55
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 20m
                  20m