[HBASE-6490] 'dfs.client.block.write.retries' value could be increased in HBase - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Closed
Priority: Minor
Resolution: Won't Fix
Affects Version/s: 2.0.0
Fix Version/s: None
Component/s: master, regionserver
Labels:
None
Environment:

all

Description

When allocating a new node during writing, hdfs tries 'dfs.client.block.write.retries' times (default 3) to write the block. When it fails, it goes back to the nanenode for a new list, and raises an error if the number of retries is reached. In HBase, if the error is while we're writing a hlog file, it will trigger a region server abort (as hbase does not trust the log anymore). For simple case (new, and as such empty log file), this seems to be ok, and we don't lose data. There could be some complex cases if the error occurs on a hlog file with already multiple blocks written.

Logs lines are:
"Exception in createBlockOutputStream", then "Abandoning block " followed by "Excluding datanode " for a retry.
IOException: "Unable to create new block.", when the number of retries is reached.

Probability of occurence seems quite low, (number of bad nodes / number of nodes)^(number of retries), and it implies that you have a region server without its datanode. But it's per new block.

Increasing the default value of 'dfs.client.block.write.retries' could make sense to be better covered in chaotic conditions.

Attachments

Issue Links

relates to

HBASE-5843 Improve HBase MTTR - Mean Time To Recover

Closed

Activity

People

Assignee:: Unassigned

Reporter:: Nicolas Liochon

Votes:: 0 Vote for this issue

Watchers:: 6 Start watching this issue

Dates

Created:: 01/Aug/12 16:20

Updated:: 13/Jun/22 18:57

Resolved:: 13/Jun/22 18:57