Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-20157

WAL file might get broken

VotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Duplicate
    • 1.1.0
    • 2.0.0
    • wal
    • None

    Description

      WAL file can get corrupted by HBASE-16824.
      When calling Writer.close() and Writer.sync() in the same time, a HDFS bug(HDFS-13243) will be triggered. And, if this did happen, the last block in WAL will get broken(NN mark it as CorruptBlock).

      My purpose of reporting this scenario here is to help those who come across the same problem like me. (HBASE-16824 has been fixed, though)

      RS log

      2018-02-05 07:58:54,212 INFO [regionserver/hb-j5e517al6xib80rkb-004.hbase.rds.aliyuncs.com/10.0.0.218:16020.logRoller] hdfs.DFSClient: Could not complete /hbase/WALs/hb-j5e517al6xib80rkb-004.hbase.rds.aliyuncs.com,16020,1517453470107/hb-j5e517al6xib80rkb-004.hbase.rds.aliyuncs.com%2C16020%2C1517453470107.default.1517788719683 retrying...
      2018-02-05 07:59:00,612 INFO [regionserver/hb-j5e517al6xib80rkb-004.hbase.rds.aliyuncs.com/10.0.0.218:16020.logRoller] hdfs.DFSClient: Could not complete /hbase/WALs/hb-j5e517al6xib80rkb-004.hbase.rds.aliyuncs.com,16020,1517453470107/hb-j5e517al6xib80rkb-004.hbase.rds.aliyuncs.com%2C16020%2C1517453470107.default.1517788719683 retrying...

      NN log

      2018-02-05 07:58:48,011 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* fsync: /hbase/WALs/hb-j5e517al6xib80rkb-004.hbase.rds.aliyuncs.com,16020,1517453470107/hb-j5e517al6xib80rkb-004.hbase.rds.aliyuncs.com%2C16020%2C1517453470107.default.1517788719683 for DFSClient_NONMAPREDUCE_1109936977_1
      2018-02-05 07:58:48,011 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: BLOCK* blk_1080650145_6909339{UCState=COMMITTED, truncateBlock=null, primaryNodeIndex=-1, replicas=[ReplicaUC[[DISK]DS-a4e579e7-4721-4c22-9b61-f1d00b33c45f:NORMAL:10.0.0.218:50010|RBW], ReplicaUC[[DISK]DS-5d3d7878-876d-4a5a-97bc-5535c4cf8d59:NORMAL:10.0.0.220:50010|RBW], ReplicaUC[[DISK]DS-ccc314b2-e2ad-4c1f-99a5-a39e3677a83b:NORMAL:10.0.0.221:50010|RBW]]} is not COMPLETE (ucState = COMMITTED, replication# = 0 < minimum = 2) in file /hbase/WALs/hb-j5e517al6xib80rkb-004.hbase.rds.aliyuncs.com,16020,1517453470107/hb-j5e517al6xib80rkb-004.hbase.rds.aliyuncs.com%2C16020%2C1517453470107.default.1517788719683
      2018-02-05 07:58:48,111 INFO BlockStateChange: BLOCK NameSystem.addToCorruptReplicasMap: blk_1080650145 added as corrupt on 10.0.0.221:50010 by hb-j5e517al6xib80rkb-005.hbase.rds.aliyuncs.com/10.0.0.221 because block is COMMITTED and reported length 1957330 does not match length in block map 80594
      2018-02-05 07:58:48,224 INFO BlockStateChange: BLOCK NameSystem.addToCorruptReplicasMap: blk_1080650145 added as corrupt on 10.0.0.218:50010 by hb-j5e517al6xib80rkb-004.hbase.rds.aliyuncs.com/10.0.0.218 because block is COMMITTED and reported length 1957330 does not match length in block map 80594
      2018-02-05 07:58:48,224 INFO BlockStateChange: BLOCK NameSystem.addToCorruptReplicasMap: blk_1080650145 added as corrupt on 10.0.0.220:50010 by hb-j5e517al6xib80rkb-003.hbase.rds.aliyuncs.com/10.0.0.220 because block is COMMITTED and reported length 1957330 does not match length in block map 80594
      2018-02-05 07:58:48,511 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: BLOCK* blk_1080650145_6909339{UCState=COMMITTED, truncateBlock=null, primaryNodeIndex=-1, replicas=[ReplicaUC[[DISK]DS-a4e579e7-4721-4c22-9b61-f1d00b33c45f:NORMAL:10.0.0.218:50010|RBW], ReplicaUC[[DISK]DS-5d3d7878-876d-4a5a-97bc-5535c4cf8d59:NORMAL:10.0.0.220:50010|RBW], ReplicaUC[[DISK]DS-ccc314b2-e2ad-4c1f-99a5-a39e3677a83b:NORMAL:10.0.0.221:50010|RBW]]} is not COMPLETE (ucState = COMMITTED, replication# = 3 >= minimum = 2) in file /hbase/WALs/hb-j5e517al6xib80rkb-004.hbase.rds.aliyuncs.com,16020,1517453470107/hb-j5e517al6xib80rkb-004.hbase.rds.aliyuncs.com%2C16020%2C1517453470107.default.1517788719683

      Attachments

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            gzh1992n Zephyr Guo
            gzh1992n Zephyr Guo
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment