Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-896 Roll up for long-lived services in YARN
  3. YARN-3669

Attempt-failures validatiy interval should have a global admin configurable lower limit

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • resourcemanager

    Description

      Found this while reviewing YARN-3480.

      When 'attemptFailuresValidityInterval'(introduced in YARN-611) is set to a small value, retried attempts might be very large. So we need to delete some attempts stored in RMStateStore and RMStateStore.

      I think we need to have a lower limit on the failure-validaty interval to avoid situations like this.

      Having this will avoid pardoning too-many failures in too-short a duration.

      Attachments

        1. YARN-3669.2.patch
          5 kB
          Xuan Gong
        2. YARN-3669.1.patch
          5 kB
          Xuan Gong

        Activity

          People

            xgong Xuan Gong
            vinodkv Vinod Kumar Vavilapalli
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

              Created:
              Updated: