Hadoop Map/Reduce
  1. Hadoop Map/Reduce
  2. MAPREDUCE-3363

The "totalnodes" and "memorytotal" fields show wrong information if the nodes are going down and coming up early(before 10min)

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Critical Critical
    • Resolution: Won't Fix
    • Affects Version/s: 0.23.0, 0.24.0
    • Fix Version/s: None
    • Component/s: mrv2
    • Labels:
      None

      Description

      The node details is not moved from Totalnodes to lostnodes for 600000 ms.So if the node is going down and coming up before the expiry interval, the cluster status in terms of the total nodes and Total cluster memory displays wrong values.
      Atleast, if the same node is coming up again...should not consider as new node.No point of time duplicate nodes should be displayed in Totalnodes list.

      1. Applications.htm
        19 kB
        Devaraj K
      2. ASF.LICENSE.NOT.GRANTED--screenshot-1.jpg
        90 kB
        Ramgopal N

        Issue Links

          Activity

          Ramgopal N created issue -
          Ramgopal N made changes -
          Field Original Value New Value
          Attachment screenshot-1.jpg [ 12502732 ]
          Ravi Teja Ch N V made changes -
          Link This issue is related to MAPREDUCE-3070 [ MAPREDUCE-3070 ]
          Devaraj K made changes -
          Assignee Devaraj K [ devaraj.k ]
          Ravi Teja Ch N V made changes -
          Link This issue relates to MAPREDUCE-3494 [ MAPREDUCE-3494 ]
          Hide
          Devaraj K added a comment -

          Attaching the screen when many jobs were submitted in this scenario.Most of the jobs fail.

          Show
          Devaraj K added a comment - Attaching the screen when many jobs were submitted in this scenario.Most of the jobs fail.
          Devaraj K made changes -
          Attachment Applications.htm [ 12505864 ]
          Hide
          Devaraj K added a comment -

          This issue will cause a huge loss, if multiple nodes restart at a time, and due to the false cluster capacity assumed, many jobs will fail in the cluster, which is a performance hit.

          Show
          Devaraj K added a comment - This issue will cause a huge loss, if multiple nodes restart at a time, and due to the false cluster capacity assumed, many jobs will fail in the cluster, which is a performance hit.
          Mahadev konar made changes -
          Priority Major [ 3 ] Critical [ 2 ]
          Ravi Teja Ch N V made changes -
          Link This issue relates to MAPREDUCE-3271 [ MAPREDUCE-3271 ]
          Devaraj K made changes -
          Assignee Devaraj K [ devaraj.k ]
          Hide
          Jason Lowe added a comment -

          It looks like the problem occurs because ephemeral ports are configured for the NodeManagers. NMs are identified by host:port pairs, and when ephemeral ports are used we lose the ability to differentiate between a new node joining the cluster and a lost node rejoining the cluster.

          In the screenshot's scenario, the ResourceManager believes that 4 nodes are in the cluster and only after the NM timeout interval (default 10min) will it realize 3 of the 4 nodes aren't there. This is not much different than a case of a cluster that has 4 separate NM machines and three of the NMs go down at the same time. The cluster capacity will be false within the timeout interval because the lost cluster capacity will not have been realized by the RM.

          If ephemeral ports are not used then this problem cannot occur today because MAPREDUCE-3070 did not really fix the quick NM reboot scenario. The NM reboot scenario only "works" with ephemeral ports because the RM sees it as a new NM joining the cluster (and a subsequent loss of an NM after the NM timeout) rather than a reboot of an existing NM. If a cluster is configured without ephemeral ports then a restarting NM cannot rejoin the cluster until after the NM timeout interval has passed on the RM, and by then the node's resources will have been removed from the cluster before being added back in when it rejoins.

          Ideally we should put in a real fix for MAPREDUCE-3070 so the RM can realize an existing NM trying to join the cluster is a reboot scenario instead of rejecting the new NM instance. Of course, the RM would have to kill off all the existing containers for the NM when it rejoins.

          The issue of detecting the difference between a new NM joining and an existing NM rejoining when ephemeral ports are configured is being tracked in MAPREDUCE-3585.

          Show
          Jason Lowe added a comment - It looks like the problem occurs because ephemeral ports are configured for the NodeManagers. NMs are identified by host:port pairs, and when ephemeral ports are used we lose the ability to differentiate between a new node joining the cluster and a lost node rejoining the cluster. In the screenshot's scenario, the ResourceManager believes that 4 nodes are in the cluster and only after the NM timeout interval (default 10min) will it realize 3 of the 4 nodes aren't there. This is not much different than a case of a cluster that has 4 separate NM machines and three of the NMs go down at the same time. The cluster capacity will be false within the timeout interval because the lost cluster capacity will not have been realized by the RM. If ephemeral ports are not used then this problem cannot occur today because MAPREDUCE-3070 did not really fix the quick NM reboot scenario. The NM reboot scenario only "works" with ephemeral ports because the RM sees it as a new NM joining the cluster (and a subsequent loss of an NM after the NM timeout) rather than a reboot of an existing NM. If a cluster is configured without ephemeral ports then a restarting NM cannot rejoin the cluster until after the NM timeout interval has passed on the RM, and by then the node's resources will have been removed from the cluster before being added back in when it rejoins. Ideally we should put in a real fix for MAPREDUCE-3070 so the RM can realize an existing NM trying to join the cluster is a reboot scenario instead of rejecting the new NM instance. Of course, the RM would have to kill off all the existing containers for the NM when it rejoins. The issue of detecting the difference between a new NM joining and an existing NM rejoining when ephemeral ports are configured is being tracked in MAPREDUCE-3585 .
          Jason Lowe made changes -
          Link This issue relates to MAPREDUCE-3585 [ MAPREDUCE-3585 ]
          Mahadev konar made changes -
          Affects Version/s 0.23.0 [ 12315570 ]
          Jason Lowe made changes -
          Link This issue is duplicated by MAPREDUCE-3585 [ MAPREDUCE-3585 ]
          Jason Lowe made changes -
          Link This issue relates to MAPREDUCE-3585 [ MAPREDUCE-3585 ]
          Hide
          Vinod Kumar Vavilapalli added a comment -

          Agree with Jason. Because NMs bind to ephemeral ports, it is not possible to make sure that it is the same node.

          Arguably 10 minutes is too long a time with the YARN's (ultra cool ) infrastructure, so we plan to lessen it by default. That should mitigate the issue a bit.

          Show
          Vinod Kumar Vavilapalli added a comment - Agree with Jason. Because NMs bind to ephemeral ports, it is not possible to make sure that it is the same node. Arguably 10 minutes is too long a time with the YARN's (ultra cool ) infrastructure, so we plan to lessen it by default. That should mitigate the issue a bit.
          Vinod Kumar Vavilapalli made changes -
          Status Open [ 1 ] Resolved [ 5 ]
          Resolution Won't Fix [ 2 ]

            People

            • Assignee:
              Unassigned
              Reporter:
              Ramgopal N
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development