Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-1380

Enable NM to automatically reuse failed local dirs after they are available again

    XMLWordPrintableJSON

    Details

    • Type: New Feature
    • Status: Resolved
    • Priority: Major
    • Resolution: Duplicate
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: nodemanager
    • Labels:

      Description

      Currently NM is able to kick bad directories out when they fail, but not able to reuse them if they are fixed. This is inconvenient in large production clusters.
      In this jira I propose a patch that I am using in my organization.
      It also adds a new metric of the number of failed directories so people have clearer view from outside.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                thehousong Hou Song
              • Votes:
                0 Vote for this issue
                Watchers:
                6 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  Time Tracking

                  Estimated:
                  Original Estimate - 48h
                  48h
                  Remaining:
                  Remaining Estimate - 48h
                  48h
                  Logged:
                  Time Spent - Not Specified
                  Not Specified