Issue Details (XML | Word | Printable)

Key: HADOOP-337
Type: New Feature New Feature
Status: Closed Closed
Resolution: Duplicate
Priority: Major Major
Assignee: Sameer Paranjpye
Reporter: Runping Qi
Votes: 1
Watchers: 1
Operations

If you were logged in you would be able to see more operations.
Hadoop Common

DFS files should be appendable

Created: 01/Jul/06 04:19 AM   Updated: 08/Jul/09 04:41 PM
Return to search
Component/s: None
Affects Version/s: 0.1.0, 0.1.1, 0.2.0, 0.2.1, 0.3.0, 0.3.1, 0.3.2, 0.4.0
Fix Version/s: None

Time Tracking:
Not Specified

Environment: all
Issue Links:
Reference
 

Resolution Date: 28/Apr/08 08:11 PM


 Description  « Hide

Actually two related issues

1. One should be able to open an existing DFS file, to seek to a position and truncate the rest, and to append starting at the end (or where trancation happens) .
2. One should be able to read the writen data of a DFS file while other is writing/appending to the file



 All   Comments   Work Log   Change History   Subversion Commits      Sort Order: Ascending order - Click to sort in descending order
p sutter added a comment - 01/Jul/06 06:17 AM

I'd suggest a strict prioritization of DFS changes (comments encouraged):

(1) Recoverability (not losing data in the presence of failures)
(2) Scalability
(3) Availability (staying up 24/7/365)
(4) Performance
(5) Features

Wherein features is a distant fifth place, not really in the running.

Its worth noting that the elegant simplicity of HDFS is key to (1), (2), and (3) - we have a lot to gain by avoiding any and all complexity in HDFS.


Runping Qi added a comment - 01/Jul/06 06:50 AM

I don't mind pritorization and agree reliability/scalability/availability are all important. However, that does not mean we have to ban new features, especially some features that are essential for some applications. For example, the feature of of the current issue is a must for one of our applications.


p sutter added a comment - 01/Jul/06 06:56 AM

I agree. Beleive me, I love a new feature as much as the next guy. In fact, my temptation was to request multiple concurrent appenders!

Doug Cutting added a comment - 01/Jul/06 08:42 AM
Can you describe more about the application that requires this feature? I wonder if there might be a reasonable workaround. Previously I proposed using a directory of files instead of a single file for a related issue.

eric baldeschwieler added a comment - 02/Jul/06 04:30 AM
We generally support Paul's prioritization. But we are also going to be moving up the tempo and volume of data we manage in the system, "A LOT". If we find a small list of features to be cost effective investments, we'd like to be able to give them back to the community.

If it makes sense for us to invest significant man-power to produce a patch for this and can validate that said patch does not destabilize large clusters over a significant period of operation, I'd assume this wouldn't be that controversial, since this is a very natural extension of the API that does seem eventually inevitable (along with multiple appenders, which would be very, very helpful). I think the thing we need to agree on is the testing criteria for acceptance, so we're not constantly derailing the community.

Do folks agree with the above? If so, let's put more energy into thinking about how we are going to establish a testing / validation regime that will support innovation, rather than trying to kill innovation for safety's sake. That way lies madness. We'll be happy to help staff / fund such a testing policy.

(Anyone want to create an "enhancement" to discuss HDFS extension validation?)


p sutter added a comment - 02/Jul/06 07:38 AM

If Doug isnt able to talk you out of it, then you must really need it. So give him a chance, but if its OK with him its probably a good thing to do. We'd love to have it, but my intuition is flashing complexity when I consider it. Especially with multiple appenders. Of course, you guys probably have a super elegant approach in mind.

My last thought: if that "other" elegant distributed replicated block-based filesystem has this feature, its probably valuable. If it doesnt, then its probably superfluous.


Sameer Paranjpye added a comment - 28/Apr/08 08:11 PM
Duplicate of HADOOP-1700