[HADOOP-3113] DFSOututStream.flush() should flush data to real block file on DataNode. - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 0.18.0
Component/s: None
Labels:
None

Hadoop Flags:

Incompatible change, Reviewed
Release Note:
Added sync() method to FSDataOutputStream to really, really persist data in HDFS. InterDatanodeProtocol to implement this feature.

Description

DFSOutputStream has a method called flush() that persists block locations on the namenode and sends all outstanding data to all datanodes in the pipeline. However, this data goes to the tmp file on the datanode(s). When the block is closed, the tmp files is renamed to be the real block file. If the datanode(s) dies before the block is compete, then entire block is lost. This behaviour wil be fixed in ~~HADOOP-1700~~.

However, in the short term, a configuration paramater can be used to allow datanodes to write to the real block file directly, thereby avoiding writing to the tmp file. This means that data that is flushed successfully by a client does not get lost even if the datanode(s) or client dies.

The Namenode already has code to pick the largest replica (if multiple datanodes have different sizes of this block). Also, the namenode has code to not trigger replication request if the file is still being written to.

The only caveat that I can think of is that the block report periodicity should be much much smaller that the lease timeout period. A block report adds the being-written-to blocks to the blocksMap thereby avoiding any cleanup that a lease expiry processing might have otherwise done.

Not all requirements specified by ~~HADOOP-1700~~ are supported by this approach, but it could still be helpful (in the short term) for a wide range of applications.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

noTmpFile.patch
24/May/08 08:21
81 kB
Dhruba Borthakur
noTmpFile.patch
30/Mar/08 08:01
2 kB
Dhruba Borthakur
tmpFile.patch
04/Jun/08 06:27
10 kB
Dhruba Borthakur
tmpFile.patch
03/Jun/08 21:06
10 kB
Dhruba Borthakur
tmpFile.patch
03/Jun/08 19:55
9 kB
Dhruba Borthakur
tmpFile.patch
03/Jun/08 19:28
9 kB
Dhruba Borthakur

Issue Links

blocks

HADOOP-1700 Append to files in HDFS

Closed

is blocked by

HADOOP-3310 Lease recovery for append

Closed

relates to

HADOOP-4961 ConcurrentModificationException in lease recovery of empty files.

Closed

Activity

People

Assignee:: Dhruba Borthakur

Reporter:: Dhruba Borthakur

Votes:: 1 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 27/Mar/08 22:32

Updated:: 08/Jul/09 16:43

Resolved:: 04/Jun/08 17:54