Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-13718

So many NotEnoughReplicasException in active NN logs

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 3.0.0
    • None
    • namenode
    • None

    Description

      After update Hadoop from 2.7.3 to 3.0.0 I have many messages about replication errors (caused by Rack Awareness) in active NN logs:

      Feb 28 01:57:20 srvg671 datalab-namenode[1807]: 2018-02-28 01:57:20,804 WARN [IPC Server handler 10 on 8020] PmsRackMapping - Got empty rack for 10.136.2.149, reverting to default.
      Feb 28 01:57:20 srvg671 datalab-namenode[1807]: 2018-02-28 01:57:20,806 DEBUG [IPC Server handler 10 on 8020] BlockPlacementPolicy - Failed to choose from local rack (location = /default-rack); the second replica is not found, retry choosing randomly
      Feb 28 01:57:20 srvg671 datalab-namenode[1807]: org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy$NotEnoughReplicasException:
      Feb 28 01:57:20 srvg671 datalab-namenode[1807]: at org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseRandom(BlockPlacementPolicyDefault.java:792)
      Feb 28 01:57:20 srvg671 datalab-namenode[1807]: at org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseRandom(BlockPlacementPolicyDefault.java:691)
      Feb 28 01:57:20 srvg671 datalab-namenode[1807]: at org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseLocalRack(BlockPlacementPolicyDefault.java:598)
      Feb 28 01:57:20 srvg671 datalab-namenode[1807]: at org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseLocalStorage(BlockPlacementPolicyDefault.java:558)
      Feb 28 01:57:20 srvg671 datalab-namenode[1807]: at org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseTargetInOrder(BlockPlacementPolicyDefault.java:461)
      Feb 28 01:57:20 srvg671 datalab-namenode[1807]: at org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseTarget(BlockPlacementPolicyDefault.java:392)
      Feb 28 01:57:20 srvg671 datalab-namenode[1807]: at org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseTarget(BlockPlacementPolicyDefault.java:268)
      Feb 28 01:57:20 srvg671 datalab-namenode[1807]: at org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseTarget(BlockPlacementPolicyDefault.java:121)
      Feb 28 01:57:20 srvg671 datalab-namenode[1807]: at org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseTarget(BlockPlacementPolicyDefault.java:137)
      Feb 28 01:57:20 srvg671 datalab-namenode[1807]: at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:2093)
      Feb 28 01:57:20 srvg671 datalab-namenode[1807]: at org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.chooseTargetForNewBlock(FSDirWriteFileOp.java:287)
      Feb 28 01:57:20 srvg671 datalab-namenode[1807]: at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2602)
      Feb 28 01:57:20 srvg671 datalab-namenode[1807]: at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:864)
      Feb 28 01:57:20 srvg671 datalab-namenode[1807]: at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:549)
      Feb 28 01:57:20 srvg671 datalab-namenode[1807]: at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
      Feb 28 01:57:20 srvg671 datalab-namenode[1807]: at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523)
      Feb 28 01:57:20 srvg671 datalab-namenode[1807]: at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991)
      Feb 28 01:57:20 srvg671 datalab-namenode[1807]: at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:869)
      Feb 28 01:57:20 srvg671 datalab-namenode[1807]: at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:815)
      Feb 28 01:57:20 srvg671 datalab-namenode[1807]: at java.security.AccessController.doPrivileged(Native Method)
      Feb 28 01:57:20 srvg671 datalab-namenode[1807]: at javax.security.auth.Subject.doAs(Subject.java:422)
      Feb 28 01:57:20 srvg671 datalab-namenode[1807]: at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1962)
      Feb 28 01:57:20 srvg671 datalab-namenode[1807]: at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2675)
      
      $ zgrep -c NotEnoughReplicasException archive/datalab-namenode.log-20180{22*,3*}.gz
      archive/datalab-namenode.log-20180228.gz:0       #hadoop 2.7.3 
      archive/datalab-namenode.log-20180301.gz:173492  #hadoop 3.0.0
      archive/datalab-namenode.log-20180302.gz:153192  #hadoop 3.0.0
      archive/datalab-namenode.log-20180303.gz:0       #hadoop 2.7.3

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              _ph Yuriy Malygin
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated: