[HDFS-4239] Means of telling the datanode to stop using a sick disk - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Major
Resolution: Duplicate
Affects Version/s: None
Fix Version/s: None
Component/s: None
Labels:
None

Description

If a disk has been deemed 'sick' – i.e. not dead but wounded, failing occasionally, or just exhibiting high latency – your choices are:

1. Decommission the total datanode. If the datanode is carrying 6 or 12 disks of data, especially on a cluster that is smallish – 5 to 20 nodes – the rereplication of the downed datanode's data can be pretty disruptive, especially if the cluster is doing low latency serving: e.g. hosting an hbase cluster.

2. Stop the datanode, unmount the bad disk, and restart the datanode (You can't unmount the disk while it is in use). This latter is better in that only the bad disk's data is rereplicated, not all datanode data.

Is it possible to do better, say, send the datanode a signal to tell it stop using a disk an operator has designated 'bad'. This would be like option #2 above minus the need to stop and restart the datanode. Ideally the disk would become unmountable after a while.

Nice to have would be being able to tell the datanode to restart using a disk after its been replaced.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

hdfs-4239.patch
16/Jan/14 00:18
19 kB
Jimmy Xiang
hdfs-4239_v2.patch
28/Jan/14 23:20
41 kB
Jimmy Xiang
hdfs-4239_v3.patch
30/Jan/14 04:04
42 kB
Jimmy Xiang
hdfs-4239_v4.patch
04/Feb/14 01:18
48 kB
Jimmy Xiang
hdfs-4239_v5.patch
04/Feb/14 21:43
48 kB
Jimmy Xiang

Issue Links

duplicates

HDFS-1362 Provide volume management functionality for DataNode

Closed

is depended upon by

HDFS-4284 BlockReaderLocal not notified of failed disks

Open

requires

HDFS-1362 Provide volume management functionality for DataNode

Closed

Activity

People

Assignee:: Yongjun Zhang

Reporter:: Michael Stack

Votes:: 0 Vote for this issue

Watchers:: 24 Start watching this issue

Dates

Created:: 29/Nov/12 19:40

Updated:: 14/Nov/14 20:04

Resolved:: 14/Nov/14 19:39