Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-929

2 MRAppMaster running parallely for same Job Id

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Duplicate
    • 2.0.5-alpha
    • None
    • resourcemanager
    • None

    Description

      Configuration :
      yarn.resourcemanager.am.max-retries = 3

      Scenario is
      NodeManager is killed forcefully i.e using kill -9 NM_PID.
      After Node expiry , RM killed all the container running in this NodeManager.
      But , MRAppMaster JVM is still running.
      RM spawn the 2nd attempt MRAppMaster since am retry is configured as 3. At this point, there are 2 MRAppMaster is running parallely for same job Id

      Problem from running 2 MRApp is 1st attempt appmaster deletes the job information from hdfs which cause FileNotFoundException for 2nd attempt MRApp.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              rohithsharma Rohith Sharma K S
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: