Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-15589

Huge PostponedMisreplicatedBlocks can't decrease immediately when start namenode after datanode

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: hdfs
    • Labels:
      None
    • Environment:

      CentOS 7

      Description

      In our test cluster, I restart my namenode. Then I found many PostponedMisreplicatedBlocks which doesn't decrease immediately. 

      I search the log below like this. 

      2020-09-21 17:02:37,029 DEBUG BlockStateChange: *BLOCK* NameNode.blockReport: from DatanodeRegistration(xx.xx.xx.xx:9866, datanodeUuid=c6a9934f-afd4-4437-b976-fed55173ce57, infoPort=9864, infoSecurePort=0, ipcPort=9867, storageInfo=lv=-57;cid=CID-9f6d0a32-e51c-459a-9f65-6e7b5791ee25;nsid=1016509846;c=1592578350834), reports.length=12
      2020-09-21 17:02:37,029 DEBUG BlockStateChange: *BLOCK* NameNode.blockReport: from DatanodeRegistration(xx.xx.xx.xx:9866, datanodeUuid=aee144f1-2082-4bca-a92b-f3c154a71c65, infoPort=9864, infoSecurePort=0, ipcPort=9867, storageInfo=lv=-57;cid=CID-9f6d0a32-e51c-459a-9f65-6e7b5791ee25;nsid=1016509846;c=1592578350834), reports.length=12
      2020-09-21 17:02:37,029 DEBUG BlockStateChange: *BLOCK* NameNode.blockReport: from DatanodeRegistration(xx.xx.xx.xx:9866, datanodeUuid=d152fa5b-1089-4bfc-b9c4-e3a7d98c7a7b, infoPort=9864, infoSecurePort=0, ipcPort=9867, storageInfo=lv=-57;cid=CID-9f6d0a32-e51c-459a-9f65-6e7b5791ee25;nsid=1016509846;c=1592578350834), reports.length=12
      2020-09-21 17:02:37,156 DEBUG BlockStateChange: *BLOCK* NameNode.blockReport: from DatanodeRegistration(xx.xx.xx.xx:9866, datanodeUuid=5cffc1fe-ace9-4af8-adfc-6002a7f5565d, infoPort=9864, infoSecurePort=0, ipcPort=9867, storageInfo=lv=-57;cid=CID-9f6d0a32-e51c-459a-9f65-6e7b5791ee25;nsid=1016509846;c=1592578350834), reports.length=12
      2020-09-21 17:02:37,161 DEBUG BlockStateChange: *BLOCK* NameNode.blockReport: from DatanodeRegistration(xx.xx.xx.xx:9866, datanodeUuid=9980d8e1-b0d9-4657-b97d-c803f82c1459, infoPort=9864, infoSecurePort=0, ipcPort=9867, storageInfo=lv=-57;cid=CID-9f6d0a32-e51c-459a-9f65-6e7b5791ee25;nsid=1016509846;c=1592578350834), reports.length=12
      2020-09-21 17:02:37,197 DEBUG BlockStateChange: *BLOCK* NameNode.blockReport: from DatanodeRegistration(xx.xx.xx.xx:9866, datanodeUuid=77ff3f5e-37f0-405f-a16c-166311546cae, infoPort=9864, infoSecurePort=0, ipcPort=9867, storageInfo=lv=-57;cid=CID-9f6d0a32-e51c-459a-9f65-6e7b5791ee25;nsid=1016509846;c=1592578350834), reports.length=12
      
      

      Node: test cluster only have 6 datanode.

      You will see the blockreport called before "Marking all datanodes as stale" which is logged by startActiveServices. But DatanodeStorageInfo.blockContentsStale only set to false in blockreport, then startActiveServices set all datnaode to stale node. So the datanodes will keep stale util next blockreport, then PostponedMisreplicatedBlocks keep a huge number.

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              zhengchenyu zhengchenyu
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated: