[HDFS-3703] Decrease the datanode failure detection time - ASF JIRA

Voters

Watch issue

Watchers

Create sub-task

Link

Clone

Update Comment Author

Replace String in Comment

Update Comment Visibility

Delete Comments

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: 1.0.3, 2.0.0-alpha, 3.0.0-alpha1
Fix Version/s: 1.1.0, 2.0.3-alpha
Component/s: datanode, namenode
Labels:
None

Hadoop Flags:

Reviewed
Release Note:

Hide
This jira adds a new DataNode state called "stale" at the NameNode. DataNodes are marked as stale if it does not send heartbeat message to NameNode within the timeout configured using the configuration parameter "dfs.namenode.stale.datanode.interval" in seconds (default value is 30 seconds). NameNode picks a stale datanode as the last target to read from when returning block locations for reads.

This feature is by default turned * off *. To turn on the feature, set the HDFS configuration "dfs.namenode.check.stale.datanode" to true.

Show
This jira adds a new DataNode state called "stale" at the NameNode. DataNodes are marked as stale if it does not send heartbeat message to NameNode within the timeout configured using the configuration parameter "dfs.namenode.stale.datanode.interval" in seconds (default value is 30 seconds). NameNode picks a stale datanode as the last target to read from when returning block locations for reads. This feature is by default turned * off *. To turn on the feature, set the HDFS configuration "dfs.namenode.check.stale.datanode" to true.

Description

By default, if a box dies, the datanode will be marked as dead by the namenode after 10:30 minutes. In the meantime, this datanode will still be proposed by the nanenode to write blocks or to read replicas. It happens as well if the datanode crashes: there is no shutdown hooks to tell the nanemode we're not there anymore.
It especially an issue with HBase. HBase regionserver timeout for production is often 30s. So with these configs, when a box dies HBase starts to recover after 30s and, while 10 minutes, the namenode will consider the blocks on the same box as available. Beyond the write errors, this will trigger a lot of missed reads:

during the recovery, HBase needs to read the blocks used on the dead box (the ones in the 'HBase Write-Ahead-Log')
after the recovery, reading these data blocks (the 'HBase region') will fail 33% of the time with the default number of replica, slowering the data access, especially when the errors are socket timeout (i.e. around 60s most of the time).

Globally, it would be ideal if HDFS settings could be under HBase settings.
As a side note, HBase relies on ZooKeeper to detect regionservers issues.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

3703-hadoop-1.0.txt
12/Sep/12 22:55
15 kB
Ted Yu
HDFS-3703.patch
28/Aug/12 00:06
16 kB
Jing Zhao
HDFS-3703-branch-1.1-read-only.patch
14/Sep/12 04:40
13 kB
Jing Zhao
HDFS-3703-branch-1.1-read-only.patch
14/Sep/12 00:52
13 kB
Jing Zhao
HDFS-3703-branch2.patch
05/Sep/12 18:45
18 kB
Nicolas Liochon
HDFS-3703-trunk-read-only.patch
12/Sep/12 23:00
22 kB
Jing Zhao
HDFS-3703-trunk-read-only.patch
12/Sep/12 22:11
22 kB
Jing Zhao
HDFS-3703-trunk-read-only.patch
11/Sep/12 22:57
18 kB
Jing Zhao
HDFS-3703-trunk-read-only.patch
11/Sep/12 17:18
18 kB
Jing Zhao
HDFS-3703-trunk-read-only.patch
11/Sep/12 05:47
18 kB
Jing Zhao
HDFS-3703-trunk-read-only.patch
10/Sep/12 18:56
16 kB
Jing Zhao
HDFS-3703-trunk-read-only.patch
10/Sep/12 18:12
16 kB
Jing Zhao
HDFS-3703-trunk-with-write.patch
07/Sep/12 18:37
18 kB
Jing Zhao

Issue Links

Add Link

is related to

HDFS-4350 Make enabling of stale marking on read and write paths independent

Closed

Delete this link

HDFS-1599 Umbrella Jira for Improving HBASE support in HDFS

Open

Delete this link

HBASE-6751 Too many retries, leading a a delay to read the HLog after a datanode failure

Closed

Delete this link

relates to

HDFS-3912 Detecting and avoiding stale datanodes for writing

Closed

Delete this link

HBASE-5843 Improve HBase MTTR - Mean Time To Recover

Closed

Delete this link

Sub-Tasks

Create Sub-Task

1.	Detecting and avoiding stale datanodes for writing	Closed	Jing Zhao	Actions
2.	Add number of stale DataNodes to metrics	Closed	Jing Zhao	Actions
3.	Add number of stale DataNodes to metrics for Branch-1	Closed	Jing Zhao	Actions

Activity

Comment

This comment will be Viewable by All Users Viewable by All Users

Cancel

People

Assignee:: Jing Zhao

Reporter:: Nicolas Liochon

Votes:: 0 Vote for this issue

Watchers:: 29 Start watching this issue

Dates

Created:: 23/Jul/12 11:02

Updated:: 12/May/16 18:13

Resolved:: 14/Sep/12 12:36

Agile

View on Board

Decrease the datanode failure detection time

Details

Description

Attachments

Attachments

Issue Links

Sub-Tasks

Activity

People

Dates

Agile

Slack

Issue deployment