Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-16385

Namenode crashes with "RedundancyMonitor thread received Runtime exception"

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 3.1.1
    • Fix Version/s: 3.3.0, 3.2.1, 3.1.3
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      *Description: *While removing dead nodes, Namenode went down with error "RedundancyMonitor thread received Runtime exception"

      *Environment: *
      Server OS :- UBUNTU
      No. of Cluster Node:- 1NN / 225DN's / 3ZK / 2RM/ 4850 NMs
      total 240 machines, in each machine 21 docker containers (1 DN & 20 NM's)

      Steps:
      1. Total number of containers running state : ~53000
      2. Because of the load, machine was going to outofMemory and restarting the machine and starting all the docker containers including NM's and DN's
      3. in some point namenode throughs below error while removing a node and NN went down.

      2019-06-19 05:54:07,262 INFO org.apache.hadoop.net.NetworkTopology: Removing a node: /rack-1550/255.255.117.195:23735
      2019-06-19 05:54:07,263 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* removeDeadDatanode: lost heartbeat from 255.255.117.151:23735, removeBlocksFromBlockMap true
      2019-06-19 05:54:07,281 INFO org.apache.hadoop.net.NetworkTopology: Removing a node: /rack-4097/255.255.117.151:23735
      2019-06-19 05:54:07,282 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* removeDeadDatanode: lost heartbeat from 255.255.116.213:23735, removeBlocksFromBlockMap true
      2019-06-19 05:54:07,290 ERROR org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: RedundancyMonitor thread received Runtime exception.
      java.lang.IllegalArgumentException: 247 should >= 248, and both should be positive.
              at com.google.common.base.Preconditions.checkArgument(Preconditions.java:88)
              at org.apache.hadoop.net.NetworkTopology.chooseRandom(NetworkTopology.java:575)
              at org.apache.hadoop.net.NetworkTopology.chooseRandom(NetworkTopology.java:552)
              at org.apache.hadoop.hdfs.net.DFSNetworkTopology.chooseRandomWithStorageTypeTwoTrial(DFSNetworkTopology.java:122)
              at org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseDataNode(BlockPlacementPolicyDefault.java:873)
              at org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseRandom(BlockPlacementPolicyDefault.java:770)
              at org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseRemoteRack(BlockPlacementPolicyDefault.java:712)
              at org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseTargetInOrder(BlockPlacementPolicyDefault.java:507)
              at org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseTarget(BlockPlacementPolicyDefault.java:425)
              at org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseTargets(BlockPlacementPolicyDefault.java:311)
              at org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseTarget(BlockPlacementPolicyDefault.java:290)
              at org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseTarget(BlockPlacementPolicyDefault.java:143)
              at org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy.chooseTarget(BlockPlacementPolicy.java:103)
              at org.apache.hadoop.hdfs.server.blockmanagement.ReplicationWork.chooseTargets(ReplicationWork.java:51)
              at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeReconstructionWorkForBlocks(BlockManager.java:1902)
              at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeBlockReconstructionWork(BlockManager.java:1854)
              at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeDatanodeWork(BlockManager.java:4842)
              at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$RedundancyMonitor.run(BlockManager.java:4709)
              at java.lang.Thread.run(Thread.java:748)
      2019-06-19 05:54:07,296 INFO org.apache.hadoop.util.ExitUtil: Exiting with status 1: java.lang.IllegalArgumentException: 247 should >= 248, and both should be positive.
      2019-06-19 05:54:07,298 INFO org.apache.hadoop.hdfs.server.common.HadoopAuditLogger.audit: process=Namenode     operation=shutdown      result=invoked
      2019-06-19 05:54:07,298 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG:
      /************************************************************
      SHUTDOWN_MSG: Shutting down NameNode at namenode/255.255.182.104
      ************************************************************/
      
      
      

        Attachments

        1. HADOOP-16385-HDFS_UT.patch
          2 kB
          Ayush Saxena
        2. HADOOP-16385-03.patch
          1 kB
          Ayush Saxena
        3. HADOOP-16385-02.patch
          2 kB
          Ayush Saxena
        4. HADOOP-16385-01.patch
          1 kB
          Ayush Saxena
        5. HADOOP-16385.branch-3.1.001.patch
          3 kB
          He Xiaoqiao

          Issue Links

            Activity

              People

              • Assignee:
                ayushtkn Ayush Saxena
                Reporter:
                mkris.reddy@gmail.com krishna reddy
              • Votes:
                0 Vote for this issue
                Watchers:
                9 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: