Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Duplicate
-
2.7.1
-
None
-
None
-
None
-
hortonworks 2.3 build 2557. 10 Datanodes , 2 NameNode in auto failover
Description
On hdfs after recording a small number of files (at least 1000) the size (150Mb - 1,6Gb) found 13 damaged files with incomplete last block.
hadoop fsck /hadoop/files/load_tarifer-zf-4_20160902165521521.csv -openforwrite -files -blocks -locations
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.
Picked up JAVA_TOOL_OPTIONS: -Dfile.encoding=UTF-8 -Dsun.jnu.encoding=UTF-8
Connecting to namenode via http://hadoop-hdfs:50070/fsck?ugi=hdfs&openforwrite=1&files=1&blocks=1&locations=1&path=%2Fstaging%2Flanding%2Fstream%2Fitc_dwh%2Ffiles%2Fload_tarifer-zf-4_20160902165521521.csv
FSCK started by hdfs (auth:SIMPLE) from /10.0.0.178 for path /hadoop/files/load_tarifer-zf-4_20160902165521521.csv at Mon Oct 10 17:12:25 MSK 2016
/hadoop/files/load_tarifer-zf-4_20160902165521521.csv 920596121 bytes, 7 block(s), OPENFORWRITE: MISSING 1 blocks of total size 115289753 B
0. BP-1552885336-10.0.0.178-1446159880991:blk_1084952841_17798971 len=134217728 repl=4 [DatanodeInfoWithStorage[10.0.0.188:50010,DS-9ba44a76-113a-43ac-87dc-46aa97ba3267,DISK], DatanodeInfoWithStorage[10.0.0.183:50010,DS-eccd375a-ea32-491b-a4a3-5ea3faca4171,DISK], DatanodeInfoWithStorage[10.0.0.184:50010,DS-ec462491-6766-490a-a92f-38e9bb3be5ce,DISK], DatanodeInfoWithStorage[10.0.0.182:50010,DS-cef46399-bb70-4f1a-ac55-d71c7e820c29,DISK]]
1. BP-1552885336-10.0.0.178-1446159880991:blk_1084952850_17799207 len=134217728 repl=3 [DatanodeInfoWithStorage[10.0.0.184:50010,DS-412769e0-0ec2-48d3-b644-b08a516b1c2c,DISK], DatanodeInfoWithStorage[10.0.0.181:50010,DS-97388b2f-c542-417d-ab06-c8d81b94fa9d,DISK], DatanodeInfoWithStorage[10.0.0.187:50010,DS-e7a11951-4315-4425-a88b-a9f6429cc058,DISK]]
2. BP-1552885336-10.0.0.178-1446159880991:blk_1084952857_17799489 len=134217728 repl=3 [DatanodeInfoWithStorage[10.0.0.184:50010,DS-7a08c597-b0f4-46eb-9916-f028efac66d7,DISK], DatanodeInfoWithStorage[10.0.0.180:50010,DS-fa6a4630-1626-43d8-9988-955a86ac3736,DISK], DatanodeInfoWithStorage[10.0.0.182:50010,DS-8670e77d-c4db-4323-bb01-e0e64bd5b78e,DISK]]
3. BP-1552885336-10.0.0.178-1446159880991:blk_1084952866_17799725 len=134217728 repl=3 [DatanodeInfoWithStorage[10.0.0.185:50010,DS-b5ff8ba0-275e-4846-b5a4-deda35aa0ad8,DISK], DatanodeInfoWithStorage[10.0.0.180:50010,DS-9cb6cade-9395-4f3a-ab7b-7fabd400b7f2,DISK], DatanodeInfoWithStorage[10.0.0.183:50010,DS-e277dcf3-1bce-4efd-a668-cd6fb2e10588,DISK]]
4. BP-1552885336-10.0.0.178-1446159880991:blk_1084952872_17799891 len=134217728 repl=4 [DatanodeInfoWithStorage[10.0.0.184:50010,DS-e1d8f278-1a22-4294-ac7e-e12d554aef7f,DISK], DatanodeInfoWithStorage[10.0.0.186:50010,DS-5d9aeb2b-e677-41cd-844e-4b36b3c84092,DISK], DatanodeInfoWithStorage[10.0.0.183:50010,DS-eccd375a-ea32-491b-a4a3-5ea3faca4171,DISK], DatanodeInfoWithStorage[10.0.0.182:50010,DS-8670e77d-c4db-4323-bb01-e0e64bd5b78e,DISK]]
5. BP-1552885336-10.0.0.78-1446159880991:blk_1084952880_17800120 len=134217728 repl=3 [DatanodeInfoWithStorage[10.0.0.181:50010,DS-79185b75-1938-4c91-a6d0-bb6687ca7e56,DISK], DatanodeInfoWithStorage[10.0.0.184:50010,DS-dcbd20aa-0334-49e0-b807-d2489f5923c6,DISK], DatanodeInfoWithStorage[10.0.0.183:50010,DS-f1d77328-f3af-483e-82e9-66ab0723a52c,DISK]]
6. BP-1552885336-10.0.0.178-1446159880991:blk_1084952887_17800316
len=115289753 MISSING!
Status: CORRUPT
Total size: 920596121 B
Total dirs: 0
Total files: 1
Total symlinks: 0
Total blocks (validated): 7 (avg. block size 131513731 B)
********************************
UNDER MIN REPL'D BLOCKS: 1 (14.285714 %)
dfs.namenode.replication.min: 1
CORRUPT FILES: 1
MISSING BLOCKS: 1
MISSING SIZE: 115289753 B
********************************
Minimally replicated blocks: 6 (85.71429 %)
Over-replicated blocks: 2 (28.571428 %)
Under-replicated blocks: 0 (0.0 %)
Mis-replicated blocks: 0 (0.0 %)
Default replication factor: 3
Average block replication: 2.857143
Corrupt blocks: 0
Missing replicas: 0 (0.0 %)
Number of data-nodes: 10
Number of racks: 1
FSCK ended at Mon Oct 10 17:12:25 MSK 2016 in 0 milliseconds
The filesystem under path '/hadoop/files/load_tarifer-zf-4_20160902165521521.csv' is CORRUPT
File is UNDER_RECOVERY, NameNode think that last block in COMMITTED state, datanode think that block in RBW state. Recover not executed. The last block file and his meta exist's in 'rwb' directory on datanode:
rw-rr- 1 hdfs hdfs 115289753 Sep 2 16:56 /hadoopdir/data/current/BP-1552885336-10.0.0.178-1446159880991/current/rbw/blk_1084952887
rw-rr- 1 hdfs hdfs 900711 Sep 2 16:56 /hadoopdir/data/current/BP-1552885336-10.0.0.178-1446159880991/current/rbw/blk_1084952887_17800316.meta
Lease recover tool said:
hdfs debug recoverLease -path /hadoop/files/load_tarifer-zf-4_20160902165521521.csv
Picked up JAVA_TOOL_OPTIONS: -Dfile.encoding=UTF-8 -Dsun.jnu.encoding=UTF-8
recoverLease got exception:
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException): Failed to RECOVER_LEASE /hadoop/files/load_tarifer-zf-4_20160902165521521.csv for DFSClient_NONMAPREDUCE_-1462314354_1 on 10.0.0.178 because the file is under construction but no leases found.
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.recoverLeaseInternal(FSNamesystem.java:2892)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.recoverLease(FSNamesystem.java:2835)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.recoverLease(NameNodeRpcServer.java:668)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.recoverLease(ClientNamenodeProtocolServerSideTranslatorPB.java:663)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2081)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2077)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2075)
at org.apache.hadoop.ipc.Client.call(Client.java:1427)
at org.apache.hadoop.ipc.Client.call(Client.java:1358)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
at com.sun.proxy.$Proxy9.recoverLease(Unknown Source)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.recoverLease(ClientNamenodeProtocolTranslatorPB.java:603)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
at com.sun.proxy.$Proxy10.recoverLease(Unknown Source)
at org.apache.hadoop.hdfs.DFSClient.recoverLease(DFSClient.java:1259)
at org.apache.hadoop.hdfs.DistributedFileSystem$2.doCall(DistributedFileSystem.java:279)
at org.apache.hadoop.hdfs.DistributedFileSystem$2.doCall(DistributedFileSystem.java:275)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.recoverLease(DistributedFileSystem.java:275)
at org.apache.hadoop.hdfs.tools.DebugAdmin$RecoverLeaseCommand.run(DebugAdmin.java:256)
at org.apache.hadoop.hdfs.tools.DebugAdmin.run(DebugAdmin.java:336)
at org.apache.hadoop.hdfs.tools.DebugAdmin.main(DebugAdmin.java:359)
Giving up on recoverLease for /hadoop/files/load_tarifer-zf-4_20160902165521521.csv after 1 try.
Attachments
Issue Links
- duplicates
-
HDFS-10763 Open files can leak permanently due to inconsistent lease update
- Closed