[HBASE-6401] HBase may lose edits after a crash if used with HDFS 1.0.3 or older - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Critical
Resolution: Won't Fix
Affects Version/s: 0.95.2
Fix Version/s: None
Component/s: regionserver
Labels:
None
Environment:

all

Description

This comes from a hdfs bug, fixed in some hdfs versions. I haven't found the hdfs jira for this.

Context: HBase Write Ahead Log features. This is using hdfs append. If the node crashes, the file that was written is read by other processes to replay the action.

So we have in hdfs one (dead) process writing with another process reading.
But, despite the call to syncFs, we don't always see the data when we have a dead node. It seems to be because the call in DFSClient#updateBlockInfo ignores the ipc errors and set the length to 0.
So we may miss all the writes to the last block if we try to connect to the dead DN.

hdfs 1.0.3, branch-1 or branch-1-win: we have the issue
http://svn.apache.org/viewvc/hadoop/common/branches/branch-1/src/hdfs/org/apache/hadoop/hdfs/DFSClient.java?revision=1359853&view=markup

hdfs branch-2 or trunk: we should not have the issue (but not tested)
http://svn.apache.org/viewvc/hadoop/common/branches/branch-2/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java?view=markup

The attached test will fail ~50 of the time.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

TestReadAppendWithDeadDN.java
17/Jul/12 12:39
4 kB
Nicolas Liochon

Issue Links

is related to

HDFS-3701 HDFS may miss the final block when reading a file opened for writing if one of the datanode is dead

Closed

relates to

HBASE-6435 Reading WAL files after a recovery leads to time lost in HDFS timeouts when using dead datanodes

Closed

HBASE-5843 Improve HBase MTTR - Mean Time To Recover

Closed

Activity

People

Assignee:: Unassigned

Reporter:: Nicolas Liochon

Votes:: 0 Vote for this issue

Watchers:: 10 Start watching this issue

Dates

Created:: 17/Jul/12 12:35

Updated:: 13/Jun/22 16:52

Resolved:: 05/Dec/12 14:59