[HADOOP-6450] Enhance FSDataOutputStream to allow retrieving the current number of replicas of current block - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Major
Resolution: Won't Fix
Affects Version/s: None
Fix Version/s: None
Component/s: fs
Labels:
None

Description

The current HDFS implementation has the limitation that it does not replicate the last partial block of a file when it is being written into until the file is closed. There are some long running applications (e.g. HBase) which writes transactions logs into HDFS. If datanode(s) in the write pipeline dies off, the application has no knowledge of it until all the datanode(s) fail and the application gets an IO error.

These applictions would benefit a lot if they can determine the number of live replicas of the current block to which it is writing data. For example, the application can decide that when one of the datanode in the write pipeline fails it will close the file and start writing to a new file.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

Replicable.txt
18/Dec/09 14:55
2 kB
Dhruba Borthakur
Replicable.txt
18/Dec/09 14:58
2 kB
Dhruba Borthakur

Issue Links

blocks

HDFS-826 Allow a mechanism for an application to detect that datanode(s) have died in the write pipeline

Closed

Activity

People

Assignee:: Dhruba Borthakur

Reporter:: Dhruba Borthakur

Votes:: 0 Vote for this issue

Watchers:: 6 Start watching this issue

Dates

Created:: 16/Dec/09 23:59

Updated:: 22/May/10 01:56

Resolved:: 22/May/10 01:56