[HADOOP-3232] Datanodes time out - ASF JIRA

Log work

Agile Board

Rank to Top

Rank to Bottom

Attach files

Attach Screenshot

Bulk Copy Attachments

Bulk Move Attachments

Voters

Watch issue

Watchers

Create sub-task

Convert to sub-task

Move

Link

Clone

Labels

Update Comment Author

Replace String in Comment

Update Comment Visibility

Delete Comments

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Critical
Resolution: Fixed
Affects Version/s: 0.16.2
Fix Version/s: 0.18.0
Component/s: None
Labels:
None
Environment:

10 node cluster + 1 namenode

Hadoop Flags:

Reviewed
Release Note:
Changed 'du' command to run in a seperate thread so that it does not block user.

Description

I recently upgraded to 0.16.2 from 0.15.2 on our 10 node cluster.
Unfortunately we're seeing datanode timeout issues. In previous versions we've often seen in the nn webui that one or two datanodes "last contact" goes from the usual 0-3 sec to ~200-300 before it drops down to 0 again.

This causes mild discomfort but the big problems appear when all nodes do this at once, as happened a few times after the upgrade.
It was suggested that this could be due to namenode garbage collection, but looking at the gc log output it doesn't seem to be the case.

Attachments

du-nonblocking-v6-trunk.patch
20/May/08 10:23
7 kB
Johan Oskarsson
du-nonblocking-v5-trunk.patch
14/May/08 11:20
6 kB
Johan Oskarsson
du-nonblocking-v4-trunk.patch
13/May/08 13:40
6 kB
Johan Oskarsson
du-nonblocking-v2-trunk.patch
09/May/08 16:37
4 kB
Johan Oskarsson
du-nonblocking-v1.patch
01/May/08 14:34
8 kB
Johan Oskarsson
hadoop-hadoop-datanode-new.log
10/Apr/08 18:17
6.15 MB
Johan Oskarsson
hadoop-hadoop-datanode-new.out
10/Apr/08 17:50
60 kB
Johan Oskarsson
hadoop-hadoop-namenode-master2.out
10/Apr/08 15:10
25 kB
Johan Oskarsson
hadoop-hadoop-datanode.out
10/Apr/08 15:06
114 kB
Johan Oskarsson

Issue Links

Add Link

is related to

HADOOP-4584 Slow generation of blockReport at DataNode causes delay of sending heartbeat to NameNode

Closed

Delete this link

Activity

Comment

This comment will be Viewable by All Users Viewable by All Users

Cancel

People

Assignee:: Johan Oskarsson Assign to me

Reporter:: Johan Oskarsson

Votes:: 1 Vote for this issue

Watchers:: 5 Start watching this issue

Dates

Created:: 10/Apr/08 14:57

Updated:: 08/Jul/09 16:43

Resolved:: 22/May/08 18:22

Agile

View on Board

Datanodes time out

Details

Description

Attachments

Attachments

Issue Links

Activity

People

Dates

Agile

Slack

Issue deployment