Hadoop Map/Reduce
  1. Hadoop Map/Reduce
  2. MAPREDUCE-3730

Allow restarted NM to rejoin cluster before RM expires it

    Details

    • Type: Improvement Improvement
    • Status: Resolved
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: 0.23.1, 2.0.0-alpha
    • Fix Version/s: 0.23.2
    • Component/s: mrv2, resourcemanager
    • Labels:
      None
    • Target Version/s:
    • Hadoop Flags:
      Reviewed
    • Release Note:
      Modified RM to allow restarted NMs to be able to join the cluster without waiting for expiry.

      Description

      When a node in the RUNNING state (healthy or unhealthy) is rebooted, the resourcemanager rejects the nodemanager's registration request as a duplicate because it is convinced that the nodemanager is already running on that node. It won't allow that node to rejoin the cluster until the node expiration time elapses which is 10min+ by default. We should allow the NM to rejoin the cluster if it re-registers within the expiration timeout.

      Note that this problem occurs with NMs that are configured to specific ports. If ephemeral ports are used then a NM reboot "works" because the RM thinks the NM registration is for a new node. See the discussions in MAPREDUCE-3070 and MAPREDUCE-3363.

      1. MAPREDUCE-3730.patch
        26 kB
        Jason Lowe
      2. MAPREDUCE-3730.patch
        23 kB
        Jason Lowe
      3. MAPREDUCE-3730.patch
        22 kB
        Jason Lowe

        Activity

        Jason Lowe created issue -
        Jason Lowe made changes -
        Field Original Value New Value
        Attachment MAPREDUCE-3730.patch [ 12511913 ]
        Jason Lowe made changes -
        Status Open [ 1 ] Patch Available [ 10002 ]
        Target Version/s 0.23.1, 0.24.0 [ 12318883, 12317654 ]
        Vinod Kumar Vavilapalli made changes -
        Status Patch Available [ 10002 ] Open [ 1 ]
        Target Version/s 0.24.0, 0.23.1 [ 12317654, 12318883 ] 0.23.1, 0.24.0 [ 12318883, 12317654 ]
        Jason Lowe made changes -
        Attachment MAPREDUCE-3730.patch [ 12512187 ]
        Jason Lowe made changes -
        Status Open [ 1 ] Patch Available [ 10002 ]
        Target Version/s 0.24.0, 0.23.1 [ 12317654, 12318883 ] 0.23.1, 0.24.0 [ 12318883, 12317654 ]
        Vinod Kumar Vavilapalli made changes -
        Status Patch Available [ 10002 ] Open [ 1 ]
        Target Version/s 0.24.0, 0.23.1 [ 12317654, 12318883 ] 0.23.1, 0.24.0 [ 12318883, 12317654 ]
        Fix Version/s 0.23.2 [ 12319851 ]
        Vinod Kumar Vavilapalli made changes -
        Target Version/s 0.24.0, 0.23.1 [ 12317654, 12318883 ] 0.23.1, 0.24.0 [ 12318883, 12317654 ]
        Priority Major [ 3 ] Minor [ 4 ]
        Jason Lowe made changes -
        Attachment MAPREDUCE-3730.patch [ 12515032 ]
        Jason Lowe made changes -
        Status Open [ 1 ] Patch Available [ 10002 ]
        Target Version/s 0.24.0, 0.23.1 [ 12317654, 12318883 ] 0.24.0, 0.23.2 [ 12317654, 12319851 ]
        Vinod Kumar Vavilapalli made changes -
        Status Patch Available [ 10002 ] Resolved [ 5 ]
        Hadoop Flags Reviewed [ 10343 ]
        Release Note Modified RM to allow restarted NMs to be able to join the cluster without waiting for expiry.
        Target Version/s 0.23.2, 0.24.0 [ 12319851, 12317654 ] 0.23.2 [ 12319851 ]
        Resolution Fixed [ 1 ]
        Allen Wittenauer made changes -
        Affects Version/s 2.0.0-alpha [ 12320354 ]
        Affects Version/s 0.24.0 [ 12317654 ]

          People

          • Assignee:
            Jason Lowe
            Reporter:
            Jason Lowe
          • Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development