Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
3.0.0
-
None
-
None
Description
After update Hadoop from 2.7.3 to 3.0.0 I have many messages about replication errors (caused by Rack Awareness) in active NN logs:
Feb 28 01:57:20 srvg671 datalab-namenode[1807]: 2018-02-28 01:57:20,804 WARN [IPC Server handler 10 on 8020] PmsRackMapping - Got empty rack for 10.136.2.149, reverting to default. Feb 28 01:57:20 srvg671 datalab-namenode[1807]: 2018-02-28 01:57:20,806 DEBUG [IPC Server handler 10 on 8020] BlockPlacementPolicy - Failed to choose from local rack (location = /default-rack); the second replica is not found, retry choosing randomly Feb 28 01:57:20 srvg671 datalab-namenode[1807]: org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy$NotEnoughReplicasException: Feb 28 01:57:20 srvg671 datalab-namenode[1807]: at org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseRandom(BlockPlacementPolicyDefault.java:792) Feb 28 01:57:20 srvg671 datalab-namenode[1807]: at org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseRandom(BlockPlacementPolicyDefault.java:691) Feb 28 01:57:20 srvg671 datalab-namenode[1807]: at org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseLocalRack(BlockPlacementPolicyDefault.java:598) Feb 28 01:57:20 srvg671 datalab-namenode[1807]: at org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseLocalStorage(BlockPlacementPolicyDefault.java:558) Feb 28 01:57:20 srvg671 datalab-namenode[1807]: at org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseTargetInOrder(BlockPlacementPolicyDefault.java:461) Feb 28 01:57:20 srvg671 datalab-namenode[1807]: at org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseTarget(BlockPlacementPolicyDefault.java:392) Feb 28 01:57:20 srvg671 datalab-namenode[1807]: at org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseTarget(BlockPlacementPolicyDefault.java:268) Feb 28 01:57:20 srvg671 datalab-namenode[1807]: at org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseTarget(BlockPlacementPolicyDefault.java:121) Feb 28 01:57:20 srvg671 datalab-namenode[1807]: at org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseTarget(BlockPlacementPolicyDefault.java:137) Feb 28 01:57:20 srvg671 datalab-namenode[1807]: at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:2093) Feb 28 01:57:20 srvg671 datalab-namenode[1807]: at org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.chooseTargetForNewBlock(FSDirWriteFileOp.java:287) Feb 28 01:57:20 srvg671 datalab-namenode[1807]: at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2602) Feb 28 01:57:20 srvg671 datalab-namenode[1807]: at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:864) Feb 28 01:57:20 srvg671 datalab-namenode[1807]: at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:549) Feb 28 01:57:20 srvg671 datalab-namenode[1807]: at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) Feb 28 01:57:20 srvg671 datalab-namenode[1807]: at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523) Feb 28 01:57:20 srvg671 datalab-namenode[1807]: at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991) Feb 28 01:57:20 srvg671 datalab-namenode[1807]: at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:869) Feb 28 01:57:20 srvg671 datalab-namenode[1807]: at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:815) Feb 28 01:57:20 srvg671 datalab-namenode[1807]: at java.security.AccessController.doPrivileged(Native Method) Feb 28 01:57:20 srvg671 datalab-namenode[1807]: at javax.security.auth.Subject.doAs(Subject.java:422) Feb 28 01:57:20 srvg671 datalab-namenode[1807]: at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1962) Feb 28 01:57:20 srvg671 datalab-namenode[1807]: at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2675)
$ zgrep -c NotEnoughReplicasException archive/datalab-namenode.log-20180{22*,3*}.gz archive/datalab-namenode.log-20180228.gz:0 #hadoop 2.7.3 archive/datalab-namenode.log-20180301.gz:173492 #hadoop 3.0.0 archive/datalab-namenode.log-20180302.gz:153192 #hadoop 3.0.0 archive/datalab-namenode.log-20180303.gz:0 #hadoop 2.7.3
Attachments
Issue Links
- is related to
-
HDFS-13236 Standby NN down with error encountered while tailing edits
- Open