Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-10992

file is under construction but no leases found

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Duplicate
    • 2.7.1
    • None
    • None
    • None
    • hortonworks 2.3 build 2557. 10 Datanodes , 2 NameNode in auto failover

    Description

      On hdfs after recording a small number of files (at least 1000) the size (150Mb - 1,6Gb) found 13 damaged files with incomplete last block.

      hadoop fsck /hadoop/files/load_tarifer-zf-4_20160902165521521.csv -openforwrite -files -blocks -locations
      DEPRECATED: Use of this script to execute hdfs command is deprecated.
      Instead use the hdfs command for it.

      Picked up JAVA_TOOL_OPTIONS: -Dfile.encoding=UTF-8 -Dsun.jnu.encoding=UTF-8
      Connecting to namenode via http://hadoop-hdfs:50070/fsck?ugi=hdfs&openforwrite=1&files=1&blocks=1&locations=1&path=%2Fstaging%2Flanding%2Fstream%2Fitc_dwh%2Ffiles%2Fload_tarifer-zf-4_20160902165521521.csv
      FSCK started by hdfs (auth:SIMPLE) from /10.0.0.178 for path /hadoop/files/load_tarifer-zf-4_20160902165521521.csv at Mon Oct 10 17:12:25 MSK 2016
      /hadoop/files/load_tarifer-zf-4_20160902165521521.csv 920596121 bytes, 7 block(s), OPENFORWRITE: MISSING 1 blocks of total size 115289753 B
      0. BP-1552885336-10.0.0.178-1446159880991:blk_1084952841_17798971 len=134217728 repl=4 [DatanodeInfoWithStorage[10.0.0.188:50010,DS-9ba44a76-113a-43ac-87dc-46aa97ba3267,DISK], DatanodeInfoWithStorage[10.0.0.183:50010,DS-eccd375a-ea32-491b-a4a3-5ea3faca4171,DISK], DatanodeInfoWithStorage[10.0.0.184:50010,DS-ec462491-6766-490a-a92f-38e9bb3be5ce,DISK], DatanodeInfoWithStorage[10.0.0.182:50010,DS-cef46399-bb70-4f1a-ac55-d71c7e820c29,DISK]]
      1. BP-1552885336-10.0.0.178-1446159880991:blk_1084952850_17799207 len=134217728 repl=3 [DatanodeInfoWithStorage[10.0.0.184:50010,DS-412769e0-0ec2-48d3-b644-b08a516b1c2c,DISK], DatanodeInfoWithStorage[10.0.0.181:50010,DS-97388b2f-c542-417d-ab06-c8d81b94fa9d,DISK], DatanodeInfoWithStorage[10.0.0.187:50010,DS-e7a11951-4315-4425-a88b-a9f6429cc058,DISK]]
      2. BP-1552885336-10.0.0.178-1446159880991:blk_1084952857_17799489 len=134217728 repl=3 [DatanodeInfoWithStorage[10.0.0.184:50010,DS-7a08c597-b0f4-46eb-9916-f028efac66d7,DISK], DatanodeInfoWithStorage[10.0.0.180:50010,DS-fa6a4630-1626-43d8-9988-955a86ac3736,DISK], DatanodeInfoWithStorage[10.0.0.182:50010,DS-8670e77d-c4db-4323-bb01-e0e64bd5b78e,DISK]]
      3. BP-1552885336-10.0.0.178-1446159880991:blk_1084952866_17799725 len=134217728 repl=3 [DatanodeInfoWithStorage[10.0.0.185:50010,DS-b5ff8ba0-275e-4846-b5a4-deda35aa0ad8,DISK], DatanodeInfoWithStorage[10.0.0.180:50010,DS-9cb6cade-9395-4f3a-ab7b-7fabd400b7f2,DISK], DatanodeInfoWithStorage[10.0.0.183:50010,DS-e277dcf3-1bce-4efd-a668-cd6fb2e10588,DISK]]
      4. BP-1552885336-10.0.0.178-1446159880991:blk_1084952872_17799891 len=134217728 repl=4 [DatanodeInfoWithStorage[10.0.0.184:50010,DS-e1d8f278-1a22-4294-ac7e-e12d554aef7f,DISK], DatanodeInfoWithStorage[10.0.0.186:50010,DS-5d9aeb2b-e677-41cd-844e-4b36b3c84092,DISK], DatanodeInfoWithStorage[10.0.0.183:50010,DS-eccd375a-ea32-491b-a4a3-5ea3faca4171,DISK], DatanodeInfoWithStorage[10.0.0.182:50010,DS-8670e77d-c4db-4323-bb01-e0e64bd5b78e,DISK]]
      5. BP-1552885336-10.0.0.78-1446159880991:blk_1084952880_17800120 len=134217728 repl=3 [DatanodeInfoWithStorage[10.0.0.181:50010,DS-79185b75-1938-4c91-a6d0-bb6687ca7e56,DISK], DatanodeInfoWithStorage[10.0.0.184:50010,DS-dcbd20aa-0334-49e0-b807-d2489f5923c6,DISK], DatanodeInfoWithStorage[10.0.0.183:50010,DS-f1d77328-f3af-483e-82e9-66ab0723a52c,DISK]]
      6. BP-1552885336-10.0.0.178-1446159880991:blk_1084952887_17800316

      {UCState=COMMITTED, truncateBlock=null, primaryNodeIndex=-1, replicas=[ReplicaUC[[DISK]DS-5f3eac72-eb55-4df7-bcaa-a6fa35c166a0:NORMAL:10.0.0.188:50010|RBW], ReplicaUC[[DISK]DS-a2a0d8f0-772e-419f-b4ff-10b4966c57ca:NORMAL:10.0.0.184:50010|RBW], ReplicaUC[[DISK]DS-52984aa0-598e-4fff-acfa-8904ca7b585c:NORMAL:10.0.0.185:50010|RBW]]}

      len=115289753 MISSING!

      Status: CORRUPT
      Total size: 920596121 B
      Total dirs: 0
      Total files: 1
      Total symlinks: 0
      Total blocks (validated): 7 (avg. block size 131513731 B)
      ********************************
      UNDER MIN REPL'D BLOCKS: 1 (14.285714 %)
      dfs.namenode.replication.min: 1
      CORRUPT FILES: 1
      MISSING BLOCKS: 1
      MISSING SIZE: 115289753 B
      ********************************
      Minimally replicated blocks: 6 (85.71429 %)
      Over-replicated blocks: 2 (28.571428 %)
      Under-replicated blocks: 0 (0.0 %)
      Mis-replicated blocks: 0 (0.0 %)
      Default replication factor: 3
      Average block replication: 2.857143
      Corrupt blocks: 0
      Missing replicas: 0 (0.0 %)
      Number of data-nodes: 10
      Number of racks: 1
      FSCK ended at Mon Oct 10 17:12:25 MSK 2016 in 0 milliseconds

      The filesystem under path '/hadoop/files/load_tarifer-zf-4_20160902165521521.csv' is CORRUPT

      File is UNDER_RECOVERY, NameNode think that last block in COMMITTED state, datanode think that block in RBW state. Recover not executed. The last block file and his meta exist's in 'rwb' directory on datanode:
      rw-rr- 1 hdfs hdfs 115289753 Sep 2 16:56 /hadoopdir/data/current/BP-1552885336-10.0.0.178-1446159880991/current/rbw/blk_1084952887
      rw-rr- 1 hdfs hdfs 900711 Sep 2 16:56 /hadoopdir/data/current/BP-1552885336-10.0.0.178-1446159880991/current/rbw/blk_1084952887_17800316.meta

      Lease recover tool said:
      hdfs debug recoverLease -path /hadoop/files/load_tarifer-zf-4_20160902165521521.csv
      Picked up JAVA_TOOL_OPTIONS: -Dfile.encoding=UTF-8 -Dsun.jnu.encoding=UTF-8
      recoverLease got exception:
      org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException): Failed to RECOVER_LEASE /hadoop/files/load_tarifer-zf-4_20160902165521521.csv for DFSClient_NONMAPREDUCE_-1462314354_1 on 10.0.0.178 because the file is under construction but no leases found.
      at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.recoverLeaseInternal(FSNamesystem.java:2892)
      at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.recoverLease(FSNamesystem.java:2835)
      at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.recoverLease(NameNodeRpcServer.java:668)
      at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.recoverLease(ClientNamenodeProtocolServerSideTranslatorPB.java:663)
      at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
      at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
      at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
      at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2081)
      at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2077)
      at java.security.AccessController.doPrivileged(Native Method)
      at javax.security.auth.Subject.doAs(Subject.java:422)
      at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
      at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2075)

      at org.apache.hadoop.ipc.Client.call(Client.java:1427)
      at org.apache.hadoop.ipc.Client.call(Client.java:1358)
      at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
      at com.sun.proxy.$Proxy9.recoverLease(Unknown Source)
      at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.recoverLease(ClientNamenodeProtocolTranslatorPB.java:603)
      at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
      at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
      at java.lang.reflect.Method.invoke(Method.java:497)
      at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
      at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
      at com.sun.proxy.$Proxy10.recoverLease(Unknown Source)
      at org.apache.hadoop.hdfs.DFSClient.recoverLease(DFSClient.java:1259)
      at org.apache.hadoop.hdfs.DistributedFileSystem$2.doCall(DistributedFileSystem.java:279)
      at org.apache.hadoop.hdfs.DistributedFileSystem$2.doCall(DistributedFileSystem.java:275)
      at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
      at org.apache.hadoop.hdfs.DistributedFileSystem.recoverLease(DistributedFileSystem.java:275)
      at org.apache.hadoop.hdfs.tools.DebugAdmin$RecoverLeaseCommand.run(DebugAdmin.java:256)
      at org.apache.hadoop.hdfs.tools.DebugAdmin.run(DebugAdmin.java:336)
      at org.apache.hadoop.hdfs.tools.DebugAdmin.main(DebugAdmin.java:359)
      Giving up on recoverLease for /hadoop/files/load_tarifer-zf-4_20160902165521521.csv after 1 try.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              cany Chernishev Aleksandr
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: