Uploaded image for project: 'Flume'
  1. Flume
  2. FLUME-503

Use HDFS sync API instead of rolling for durability

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Resolved
    • Major
    • Resolution: Won't Fix
    • 0.9.4
    • 0.9.5
    • Sinks+Sources
    • None

    Description

      Some versions of Hadoop (CDH3>b2 or 0.20-append branch) support a sync() API that guarantees data has been flushed to all of the nodes in the write pipeline. This should be equally as durable as closing an HDFS file.

      Flume should allow the use of sync() to make data durable on a regular basis without having to create lots of tiny files on HDFS.

      Related is the ability to use the getNumCurrentReplicas() API to detect when the number of replicas falls below the desired replication factor, and roll at that point (to pick up a new DN)

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              flume_todd Disabled imported user
              Votes:
              2 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: