[HADOOP-4584] Slow generation of blockReport at DataNode causes delay of sending heartbeat to NameNode - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 0.21.0
Component/s: None
Labels:
None

Hadoop Flags:

Reviewed

Description

sometimes due to disk or some other problems, datanode takes minutes or tens of minutes to generate a block report. It causes the datanode not able to send heartbeat to NameNode every 3 seconds. In the worst case, it makes NameNode to detect a lost heartbeat and wrongly decide that the datanode is dead.

It would be nice to have two threads instead. One thread is for scanning data directories and generating block report, and executes the requests sent by NameNode; Another thread is for sending heartbeats, block reports, and picking up the requests from NameNode. By having these two threads, the sending of heartbeats will not get delayed by any slow block report or slow execution of NameNode requests.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

4584.patch
02/Feb/09 18:51
16 kB
Suresh Srinivas
4584.patch
02/Feb/09 20:10
15 kB
Suresh Srinivas
4584.patch
06/Feb/09 03:29
18 kB
Suresh Srinivas
4584.patch
08/Feb/09 19:56
18 kB
Suresh Srinivas
4584.patch
11/Feb/09 04:35
18 kB
Suresh Srinivas
4584.patch
23/Feb/09 20:58
29 kB
Suresh Srinivas
4584.hbthread.patch
27/Feb/09 23:12
20 kB
Suresh Srinivas
4584.brthread.2.patch
12/Mar/09 22:42
54 kB
Suresh Srinivas
4584.brthread.3.patch
17/Mar/09 22:54
29 kB
Suresh Srinivas
4584.brthread.3.patch
18/Mar/09 01:35
54 kB
Suresh Srinivas
4584.brthread.3.patch
18/Mar/09 17:28
55 kB
Suresh Srinivas
4584.brthread.3.patch
23/Mar/09 22:04
46 kB
Suresh Srinivas
4584.brthread.3.patch
25/Mar/09 20:28
47 kB
Suresh Srinivas
Design.pdf
26/Mar/09 00:26
65 kB
Suresh Srinivas
Design.pdf
27/Mar/09 22:37
69 kB
Suresh Srinivas
4584.brthread.4.patch
31/Mar/09 21:44
50 kB
Suresh Srinivas
4584.brthread.4.patch
02/Apr/09 02:34
49 kB
Suresh Srinivas
4584.brthread.4.patch
03/Apr/09 06:03
49 kB
Suresh Srinivas
4584.brthread.5.patch
09/Apr/09 00:50
50 kB
Suresh Srinivas
4584.brthread.5.patch
09/Apr/09 19:25
50 kB
Suresh Srinivas

Issue Links

is part of

HADOOP-4556 Block went missing

Closed

is related to

HDFS-770 SocketTimeoutException: timeout while waiting for channel to be ready for read

Open

HDFS-2379 0.20: Allow block reports to proceed without holding FSDataset lock

Closed

relates to

HADOOP-3232 Datanodes time out

Closed

Activity

People

Assignee:: Suresh Srinivas

Reporter:: Hairong Kuang

Votes:: 0 Vote for this issue

Watchers:: 14 Start watching this issue

Dates

Created:: 04/Nov/08 00:51

Updated:: 28/Sep/11 12:06

Resolved:: 10/Apr/09 20:16