Details
-
New Feature
-
Status: Resolved
-
Major
-
Resolution: Won't Fix
-
0.9.4
-
None
Description
Some versions of Hadoop (CDH3>b2 or 0.20-append branch) support a sync() API that guarantees data has been flushed to all of the nodes in the write pipeline. This should be equally as durable as closing an HDFS file.
Flume should allow the use of sync() to make data durable on a regular basis without having to create lots of tiny files on HDFS.
Related is the ability to use the getNumCurrentReplicas() API to detect when the number of replicas falls below the desired replication factor, and roll at that point (to pick up a new DN)