Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-7453

Namenode does not recognize block is missing on a datanode

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Critical
    • Resolution: Duplicate
    • 2.4.1
    • None
    • namenode
    • None

    Description

      We got a severe datalose due to the fact that namenode didn't recognized block it thinks exist are not actually exist on the datanodes. The senario is something as follows:

      1. A disk fails on one of the datanodes
      2. The datanode is forced to decommiss and shut down
      3. The disk is replaced and datanode is back again
      4. fsck shows everything is fine
      5. Repeat 1-4 for few weeks
      6. Restart the namenode
      7. It suddenly sees tens of thousands under-replicated blocks and hundreds missing blocks

      During the next disk failure we analysed situation a bit more and found particular block on particular datanode that is missing: there is no file for block and if we try to read it, we got

      java.io.IOException: Got error for OP_READ_BLOCK, self=/XXX:33817, remote=XXX/X.X.X.X:50010, for file XXX/X.X.X.X:50010:BP-879324367-YYY-1404837025894:1083356878, for pool BP-879324367-YYY-1404837025894 block 1083356878_9644290
      

      We restarted the datanode and in the log we can see that it did scan all the directories and send the report to namenode:

      2014-11-27 17:06:34,174  INFO [DataNode: [[[DISK]file:/mnt/hadoop/0/dfs/data/, [DISK]file:/mnt/hadoop/1/dfs/data/, [DISK]file:/mnt/hadoop/2/dfs/data/]]  heartbeating to /YYY:8020] FsDatasetImpl - Adding block pool BP-879324367-YYY-1404837025894
      2014-11-27 17:06:34,175  INFO [Thread-41] FsDatasetImpl - Scanning block pool BP-879324367-YYY-1404837025894 on volume /mnt/hadoop/0/dfs/data/current...
      2014-11-27 17:06:34,176  INFO [Thread-43] FsDatasetImpl - Scanning block pool BP-879324367-YYY-1404837025894 on volume /mnt/hadoop/2/dfs/data/current...
      2014-11-27 17:06:34,176  INFO [Thread-42] FsDatasetImpl - Scanning block pool BP-879324367-YYY-1404837025894 on volume /mnt/hadoop/1/dfs/data/current...
      2014-11-27 17:06:34,279  INFO [Thread-42] FsDatasetImpl - Cached dfsUsed found for /mnt/hadoop/1/dfs/data/current/BP-879324367-YYY-1404837025894/current: 62677866794
      2014-11-27 17:06:34,282  INFO [Thread-42] FsDatasetImpl - Time taken to scan block pool BP-879324367-YYY-1404837025894 on /mnt/hadoop/1/dfs/data/current: 105ms
      
      2014-11-27 17:06:34,744  INFO [Thread-41] FsDatasetImpl - Cached dfsUsed found for /mnt/hadoop/0/dfs/data/current/BP-879324367-YYY-1404837025894/current: 2465590681232
      2014-11-27 17:06:34,744  INFO [Thread-41] FsDatasetImpl - Time taken to scan block pool BP-879324367-YYY-1404837025894 on /mnt/hadoop/0/dfs/data/current: 568ms
      2014-11-27 17:06:34,856  INFO [Thread-43] FsDatasetImpl - Cached dfsUsed found for /mnt/hadoop/2/dfs/data/current/BP-879324367-YYY-1404837025894/current: 2475580099468
      2014-11-27 17:06:34,857  INFO [Thread-43] FsDatasetImpl - Time taken to scan block pool BP-879324367-YYY-1404837025894 on /mnt/hadoop/2/dfs/data/current: 680ms
      2014-11-27 17:06:34,857  INFO [DataNode: [[[DISK]file:/mnt/hadoop/0/dfs/data/, [DISK]file:/mnt/hadoop/1/dfs/data/, [DISK]file:/mnt/hadoop/2/dfs/data/]]  heartbeating to /YYY:8020] 
      2014-11-27 17:06:34,858  INFO [Thread-44] FsDatasetImpl - Adding replicas to map for block pool BP-879324367-YYY-1404837025894 on volume /mnt/hadoop/0/dfs/data/current...
      2014-11-27 17:06:34,890  INFO [Thread-46] FsDatasetImpl - Adding replicas to map for block pool BP-879324367-YYY-1404837025894 on volume /mnt/hadoop/2/dfs/data/current...
      2014-11-27 17:06:34,890  INFO [Thread-45] FsDatasetImpl - Adding replicas to map for block pool BP-879324367-YYY-1404837025894 on volume /mnt/hadoop/1/dfs/data/current...
      2014-11-27 17:06:34,961  INFO [Thread-45] FsDatasetImpl - Time to add replicas to map for block pool BP-879324367-YYY-1404837025894 on volume /mnt/hadoop/1/dfs/data/current: 70ms
      2014-11-27 17:06:36,083  INFO [Thread-44] FsDatasetImpl - Time to add replicas to map for block pool BP-879324367-YYY-1404837025894 on volume /mnt/hadoop/0/dfs/data/current: 1193ms
      2014-11-27 17:06:36,162  INFO [Thread-46] FsDatasetImpl - Time to add replicas to map for block pool BP-879324367-YYY-1404837025894 on volume /mnt/hadoop/2/dfs/data/current: 1271ms
      2014-11-27 17:06:36,162  INFO [DataNode: [[[DISK]file:/mnt/hadoop/0/dfs/data/, [DISK]file:/mnt/hadoop/1/dfs/data/, [DISK]file:/mnt/hadoop/2/dfs/data/]]  heartbeating to /YYY:8020] FsDatasetImpl - Total time to add all replicas to map: 1304ms
      2014-11-27 17:06:36,167  INFO [DataNode: [[[DISK]file:/mnt/hadoop/0/dfs/data/, [DISK]file:/mnt/hadoop/1/dfs/data/, [DISK]file:/mnt/hadoop/2/dfs/data/]]  heartbeating to /YYY:8020] DataNode - Block pool BP-879324367-YYY-1404837025894 (Datanode Uuid null) service to /YYY:8020 beginning handshake with NN
      2014-11-27 17:06:36,167  INFO [DataNode: [[[DISK]file:/mnt/hadoop/0/dfs/data/, [DISK]file:/mnt/hadoop/1/dfs/data/, [DISK]file:/mnt/hadoop/2/dfs/data/]]  heartbeating to /XXX:8020] DataNode - Block pool BP-879324367-YYY-1404837025894 (Datanode Uuid null) service to /XXX:8020 beginning handshake with NN
      2014-11-27 17:06:36,185  INFO [DataNode: [[[DISK]file:/mnt/hadoop/0/dfs/data/, [DISK]file:/mnt/hadoop/1/dfs/data/, [DISK]file:/mnt/hadoop/2/dfs/data/]]  heartbeating to /YYY:8020] DataNode - Block pool Block pool BP-879324367-YYY-1404837025894 (Datanode Uuid null) service to /YYY:8020 successfully registered with NN
      2014-11-27 17:06:36,185  INFO [DataNode: [[[DISK]file:/mnt/hadoop/0/dfs/data/, [DISK]file:/mnt/hadoop/1/dfs/data/, [DISK]file:/mnt/hadoop/2/dfs/data/]]  heartbeating to /XXX:8020] DataNode - Block pool Block pool BP-879324367-YYY-1404837025894 (Datanode Uuid null) service to /XXX:8020 successfully registered with NN
      2014-11-27 17:06:36,186  INFO [DataNode: [[[DISK]file:/mnt/hadoop/0/dfs/data/, [DISK]file:/mnt/hadoop/1/dfs/data/, [DISK]file:/mnt/hadoop/2/dfs/data/]]  heartbeating to /YYY:8020] DataNode - For namenode /YYY:8020 using DELETEREPORT_INTERVAL of 300000 msec  BLOCKREPORT_INTERVAL of 21600000msec CACHEREPORT_INTERVAL of 10000msec Initial delay: 0msec; heartBeatInterval=3000
      2014-11-27 17:06:36,186  INFO [DataNode: [[[DISK]file:/mnt/hadoop/0/dfs/data/, [DISK]file:/mnt/hadoop/1/dfs/data/, [DISK]file:/mnt/hadoop/2/dfs/data/]]  heartbeating to /XXX:8020] DataNode - For namenode /XXX:8020 using DELETEREPORT_INTERVAL of 300000 msec  BLOCKREPORT_INTERVAL of 21600000msec CACHEREPORT_INTERVAL of 10000msec Initial delay: 0msec; heartBeatInterval=3000
      2014-11-27 17:06:36,264  INFO [DataNode: [[[DISK]file:/mnt/hadoop/0/dfs/data/, [DISK]file:/mnt/hadoop/1/dfs/data/, [DISK]file:/mnt/hadoop/2/dfs/data/]]  heartbeating to /YYY:8020] DataNode - Namenode Block pool BP-879324367-YYY-1404837025894 (Datanode Uuid 91e8deeb-7030-4e7b-af14-a3f35cf65bd2) service to /YYY:8020 trying to claim ACTIVE state with txid=114828921
      2014-11-27 17:06:36,265  INFO [DataNode: [[[DISK]file:/mnt/hadoop/0/dfs/data/, [DISK]file:/mnt/hadoop/1/dfs/data/, [DISK]file:/mnt/hadoop/2/dfs/data/]]  heartbeating to /YYY:8020] DataNode - Acknowledging ACTIVE Namenode Block pool BP-879324367-YYY-1404837025894 (Datanode Uuid 91e8deeb-7030-4e7b-af14-a3f35cf65bd2) service to /YYY:8020
      2014-11-27 17:06:36,511  INFO [DataNode: [[[DISK]file:/mnt/hadoop/0/dfs/data/, [DISK]file:/mnt/hadoop/1/dfs/data/, [DISK]file:/mnt/hadoop/2/dfs/data/]]  heartbeating to /XXX:8020] DataNode - Sent 1 blockreports 49708 blocks total. Took 72 msec to generate and 174 msecs for RPC and NN processing.  Got back commands none
      2014-11-27 17:06:36,511  INFO [DataNode: [[[DISK]file:/mnt/hadoop/0/dfs/data/, [DISK]file:/mnt/hadoop/1/dfs/data/, [DISK]file:/mnt/hadoop/2/dfs/data/]]  heartbeating to /YYY:8020] DataNode - Sent 1 blockreports 49708 blocks total. Took 76 msec to generate and 170 msecs for RPC and NN processing.  Got back commands org.apache.hadoop.hdfs.server.protocol.FinalizeCommand@464c42e6
      2014-11-27 17:06:36,512  INFO [DataNode: [[[DISK]file:/mnt/hadoop/0/dfs/data/, [DISK]file:/mnt/hadoop/1/dfs/data/, [DISK]file:/mnt/hadoop/2/dfs/data/]]  heartbeating to /YYY:8020] DataNode - Got finalize command for block pool BP-879324367-YYY-1404837025894
      2014-11-27 17:06:36,519  INFO [DataNode: [[[DISK]file:/mnt/hadoop/0/dfs/data/, [DISK]file:/mnt/hadoop/1/dfs/data/, [DISK]file:/mnt/hadoop/2/dfs/data/]]  heartbeating to /XXX:8020] GSet - Computing capacity for map BlockMap
      2014-11-27 17:06:36,523  INFO [DataNode: [[[DISK]file:/mnt/hadoop/0/dfs/data/, [DISK]file:/mnt/hadoop/1/dfs/data/, [DISK]file:/mnt/hadoop/2/dfs/data/]]  heartbeating to /XXX:8020] GSet - capacity      = 2^20 = 1048576 entries
      2014-11-27 17:06:36,520  INFO [DataNode: [[[DISK]file:/mnt/hadoop/0/dfs/data/, [DISK]file:/mnt/hadoop/1/dfs/data/, [DISK]file:/mnt/hadoop/2/dfs/data/]]  heartbeating to /XXX:8020] GSet - VM type       = 64-bit
      2014-11-27 17:06:36,522  INFO [DataNode: [[[DISK]file:/mnt/hadoop/0/dfs/data/, [DISK]file:/mnt/hadoop/1/dfs/data/, [DISK]file:/mnt/hadoop/2/dfs/data/]]  heartbeating to /XXX:8020] GSet - 0.5% max memory 1.9 GB = 9.9 MB
      2014-11-27 17:06:36,524  INFO [DataNode: [[[DISK]file:/mnt/hadoop/0/dfs/data/, [DISK]file:/mnt/hadoop/1/dfs/data/, [DISK]file:/mnt/hadoop/2/dfs/data/]]  heartbeating to /XXX:8020] BlockPoolSliceScanner - Periodic Block Verification Scanner initialized with interval 504 hours for block pool BP-879324367-YYY-1404837025894
      

      But block is sitll reported with two replicas, one of those is assigned to datanode which hasn't it:

      <block_info>
      <block_id>1083356878</block_id>
      <block_name>blk_1083356878</block_name>
      <file>
      <local_name>commons-lang-2.6.jar</local_name>
      <local_directory>/user/user/analysis</local_directory>
      <user_name>user</user_name>
      <group_name>supergroup</group_name>
      <is_directory>false</is_directory>
      <access_time>1417094143429</access_time>
      <is_under_construction>false</is_under_construction>
      <ds_quota>-1</ds_quota>
      <permission_status>user:supergroup:rw-r--r--</permission_status>
      <replication>2</replication>
      <disk_space_consumed>568440</disk_space_consumed>
      <preferred_block_size>134217728</preferred_block_size>
      </file>
      <replicas>
      <replica>
      <host_name>bad_node</host_name>
      <is_corrupt>false</is_corrupt>
      </replica>
      <replica>
      <host_name>good_node</host_name>
      <is_corrupt>false</is_corrupt>
      </replica>
      </replicas>
      </block_info>
      

      fsck does not change the situation, ony the namenode restart force it to recognize that block replica as missing.

      As a result it is required to restart namenode after each disk failure which is not very practical...

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              dmitrybugaychenko Dmitry Bugaychenko
              Votes:
              2 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: