[HBASE-20157] WAL file might get broken - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Duplicate
Affects Version/s: 1.1.0
Fix Version/s: 2.0.0
Component/s: wal
Labels:
None

Description

WAL file can get corrupted by ~~HBASE-16824~~.
When calling Writer.close() and Writer.sync() in the same time, a HDFS bug(HDFS-13243) will be triggered. And, if this did happen, the last block in WAL will get broken(NN mark it as CorruptBlock).

My purpose of reporting this scenario here is to help those who come across the same problem like me. (~~HBASE-16824~~ has been fixed, though)

RS log

2018-02-05 07:58:54,212 INFO [regionserver/hb-j5e517al6xib80rkb-004.hbase.rds.aliyuncs.com/10.0.0.218:16020.logRoller] hdfs.DFSClient: Could not complete /hbase/WALs/hb-j5e517al6xib80rkb-004.hbase.rds.aliyuncs.com,16020,1517453470107/hb-j5e517al6xib80rkb-004.hbase.rds.aliyuncs.com%2C16020%2C1517453470107.default.1517788719683 retrying...
2018-02-05 07:59:00,612 INFO [regionserver/hb-j5e517al6xib80rkb-004.hbase.rds.aliyuncs.com/10.0.0.218:16020.logRoller] hdfs.DFSClient: Could not complete /hbase/WALs/hb-j5e517al6xib80rkb-004.hbase.rds.aliyuncs.com,16020,1517453470107/hb-j5e517al6xib80rkb-004.hbase.rds.aliyuncs.com%2C16020%2C1517453470107.default.1517788719683 retrying...

NN log

2018-02-05 07:58:48,011 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* fsync: /hbase/WALs/hb-j5e517al6xib80rkb-004.hbase.rds.aliyuncs.com,16020,1517453470107/hb-j5e517al6xib80rkb-004.hbase.rds.aliyuncs.com%2C16020%2C1517453470107.default.1517788719683 for DFSClient_NONMAPREDUCE_1109936977_1
2018-02-05 07:58:48,011 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: BLOCK* blk_1080650145_6909339{UCState=COMMITTED, truncateBlock=null, primaryNodeIndex=-1, replicas=[ReplicaUC[[DISK]DS-a4e579e7-4721-4c22-9b61-f1d00b33c45f:NORMAL:10.0.0.218:50010|RBW], ReplicaUC[[DISK]DS-5d3d7878-876d-4a5a-97bc-5535c4cf8d59:NORMAL:10.0.0.220:50010|RBW], ReplicaUC[[DISK]DS-ccc314b2-e2ad-4c1f-99a5-a39e3677a83b:NORMAL:10.0.0.221:50010|RBW]]} is not COMPLETE (ucState = COMMITTED, replication# = 0 < minimum = 2) in file /hbase/WALs/hb-j5e517al6xib80rkb-004.hbase.rds.aliyuncs.com,16020,1517453470107/hb-j5e517al6xib80rkb-004.hbase.rds.aliyuncs.com%2C16020%2C1517453470107.default.1517788719683
2018-02-05 07:58:48,111 INFO BlockStateChange: BLOCK NameSystem.addToCorruptReplicasMap: blk_1080650145 added as corrupt on 10.0.0.221:50010 by hb-j5e517al6xib80rkb-005.hbase.rds.aliyuncs.com/10.0.0.221 because block is COMMITTED and reported length 1957330 does not match length in block map 80594
2018-02-05 07:58:48,224 INFO BlockStateChange: BLOCK NameSystem.addToCorruptReplicasMap: blk_1080650145 added as corrupt on 10.0.0.218:50010 by hb-j5e517al6xib80rkb-004.hbase.rds.aliyuncs.com/10.0.0.218 because block is COMMITTED and reported length 1957330 does not match length in block map 80594
2018-02-05 07:58:48,224 INFO BlockStateChange: BLOCK NameSystem.addToCorruptReplicasMap: blk_1080650145 added as corrupt on 10.0.0.220:50010 by hb-j5e517al6xib80rkb-003.hbase.rds.aliyuncs.com/10.0.0.220 because block is COMMITTED and reported length 1957330 does not match length in block map 80594
2018-02-05 07:58:48,511 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: BLOCK* blk_1080650145_6909339{UCState=COMMITTED, truncateBlock=null, primaryNodeIndex=-1, replicas=[ReplicaUC[[DISK]DS-a4e579e7-4721-4c22-9b61-f1d00b33c45f:NORMAL:10.0.0.218:50010|RBW], ReplicaUC[[DISK]DS-5d3d7878-876d-4a5a-97bc-5535c4cf8d59:NORMAL:10.0.0.220:50010|RBW], ReplicaUC[[DISK]DS-ccc314b2-e2ad-4c1f-99a5-a39e3677a83b:NORMAL:10.0.0.221:50010|RBW]]} is not COMPLETE (ucState = COMMITTED, replication# = 3 >= minimum = 2) in file /hbase/WALs/hb-j5e517al6xib80rkb-004.hbase.rds.aliyuncs.com,16020,1517453470107/hb-j5e517al6xib80rkb-004.hbase.rds.aliyuncs.com%2C16020%2C1517453470107.default.1517788719683

Attachments

Issue Links

duplicates

HBASE-16824 Writer.flush() can be called on already closed streams in WAL roll

Closed

Activity

People

Assignee:: Zephyr Guo

Reporter:: Zephyr Guo

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 08/Mar/18 06:55

Updated:: 01/Aug/18 06:21

Resolved:: 08/Mar/18 07:57