[HADOOP-1700] Append to files in HDFS - ASF JIRA

XML

Word

Printable

JSON

Details

Type: New Feature
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: 0.15.1
Fix Version/s: 0.19.0
Component/s: None
Labels:
None

Hadoop Flags:

Incompatible change, Reviewed
Release Note:

Hide
Introduced append operation for HDFS files.

Show
Introduced append operation for HDFS files.

Description

Request for being able to append to files in HDFS has been raised a couple of times on the list of late. For one example, see http://www.nabble.com/HDFS%2C-appending-writes-status-tf3848237.html#a10916193. Other mail describes folks' workarounds because this feature is lacking: e.g. http://www.nabble.com/Loading-data-into-HDFS-tf4200003.html#a12039480 (Later on this thread, Jim Kellerman re-raises the HBase need of this feature). ~~HADOOP-337~~ 'DFS files should be appendable' makes mention of file append but it was opened early in the life of HDFS when the focus was more on implementing the basics rather than adding new features. Interest fizzled. Because ~~HADOOP-337~~ is also a bit of a grab-bag – it includes truncation and being able to concurrently read/write – rather than try and breathe new life into ~~HADOOP-337~~, instead, here is a new issue focused on file append. Ultimately, being able to do as the google GFS paper describes – having multiple concurrent clients making 'Atomic Record Append' to a single file would be sweet but at least for a first cut at this feature, IMO, a single client appending to a single HDFS file letting the application manage the access would be sufficent.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

Grid_HadoopRenumberBlocks.pdf
07/May/08 22:53
74 kB
Robert Chansler
appendtrunk9.patch
14/Jul/08 21:09
79 kB
Tsz-wo Sze
appendtrunk8.patch
10/Jul/08 02:43
76 kB
Tsz-wo Sze
appendtrunk7.patch
08/Jul/08 22:36
73 kB
Dhruba Borthakur
appendtrunk6.patch
07/Jul/08 22:58
73 kB
Dhruba Borthakur
appendtrunk16.patch
24/Jul/08 13:24
80 kB
Dhruba Borthakur
appendtrunk15.patch
19/Jul/08 01:25
80 kB
Dhruba Borthakur
appendtrunk14.patch
17/Jul/08 00:40
80 kB
Dhruba Borthakur
appendtrunk14.patch
17/Jul/08 01:02
79 kB
Dhruba Borthakur
appendtrunk13.patch
16/Jul/08 01:39
64 kB
Dhruba Borthakur
appendtrunk13.patch
16/Jul/08 20:15
79 kB
Dhruba Borthakur
appendtrunk13.patch
16/Jul/08 21:44
79 kB
Dhruba Borthakur
appendtrunk12.patch
15/Jul/08 19:03
80 kB
Tsz-wo Sze
appendtrunk11.patch
15/Jul/08 18:21
80 kB
Tsz-wo Sze
appendtrunk10.patch
15/Jul/08 00:56
81 kB
Tsz-wo Sze
Appends.html
16/Nov/07 06:40
45 kB
Dhruba Borthakur
Appends.doc
16/Nov/07 06:39
70 kB
Dhruba Borthakur
Appends.doc
29/Nov/07 17:53
76 kB
Dhruba Borthakur
append3.patch
29/Jun/08 10:49
55 kB
Dhruba Borthakur
append.patch
26/Dec/07 02:49
22 kB
Ruyue Ma
1700_20080606.patch
06/Jun/08 22:10
14 kB
Tsz-wo Sze

Issue Links

depends upon

HADOOP-3283 Need a mechanism for data nodes to update generation stamps.

Closed

HADOOP-3310 Lease recovery for append

Closed

is blocked by

HADOOP-2565 DFSPath cache of FileStatus can become stale

Resolved

HADOOP-2655 Copy on write for data and metadata files in the presence of snapshots

Closed

HADOOP-2656 Support for upgrading existing cluster to facilitate appends to HDFS files

Closed

HADOOP-3113 DFSOututStream.flush() should flush data to real block file on DataNode.

Closed

HADOOP-3176 Change lease record when a open-for-write-file gets renamed

Closed

HADOOP-3503 Race condition when client and namenode start block recovery simultaneously

Closed

HADOOP-3161 TestFileAppend fails on Mac since HADOOP-2655 was committed

Closed

HADOOP-2658 Design and Implement a Test Plan to support appends to HDFS files

Closed

HADOOP-3201 namenode should be able to retrieve block metadata from a datanode

Closed

HADOOP-3250 Extend FileSystem API to allow appending to files

Closed

HADOOP-1707 Remove the DFS Client disk-based cache

Closed

HADOOP-2345 new transactions to support HDFS Appends

Closed

HADOOP-3177 Expose DFSOutputStream.fsync API though the FileSystem interface

Closed

HADOOP-3515 Protocol changes to allow appending to the last partial crc chunk of a file

Closed

is depended upon by

HADOOP-3790 Add more unit tests to test appending to files in HDFS

Closed

is related to

HDFS-200 In HDFS, sync() not yet guarantees data available to the new readers

Closed

HADOOP-337 DFS files should be appendable

Closed

is superceded by

HDFS-265 Revisit append

Closed

relates to

HADOOP-3834 Checkin the design document for HDFS appends into source control repository

Resolved

HADOOP-89 files are not visible until they are closed

Closed

HADOOP-1497 Possibility of duplicate blockids if dead-datanodes come back up after corresponding files were deleted

Closed

HADOOP-3241 DFSFileInfo should also have field to say if the file is underconstrction

Closed

HADOOP-3329 DatanodeDescriptor objects stored in FSImage may be out dated.

Closed

HADOOP-3832 Create more unit tests for testing HDFS appends

Closed

HADOOP-2657 Enhancements to DFSClient to support flushing data at any point in time

Closed

HDFS-2253 Create a benchmark to measure performance of "append" to HDFS files

Resolved

(11 is blocked by, 1 is depended upon by, 2 is related to, 1 is superceded by, 8 relates to)

Activity

People

Assignee:: Dhruba Borthakur

Reporter:: Michael Stack

Votes:: 11 Vote for this issue

Watchers:: 28 Start watching this issue

Dates

Created:: 09/Aug/07 18:45

Updated:: 07/Jan/16 18:37

Resolved:: 25/Jul/08 18:09