Uploaded image for project: 'Flume'
  1. Flume
  2. FLUME-3085

HDFS Sink can skip flushing some BucketWriters, might lead to data loss

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Critical
    • Resolution: Fixed
    • 1.7.0
    • 1.8.0
    • Sinks+Sources
    • None

    Description

      The HDFSEventSink.process() is already prepared for a rare race condition, namely when the BucketWriter acquired in line 389 gets closed by an other thread (e.g. because the idleTimeout or the rollInterval) before the append() is called in line 406.
      If this is the case the BucketWriter.append() call throws a BucketClosedException and the sink creates a new BucketWriter instance and appends to it.
      But this newly created instance won't be added to the writers list, which means that it won't be flushed after the processing loop finished: https://github.com/apache/flume/blob/trunk/flume-ng-sinks/flume-hdfs-sink/src/main/java/org/apache/flume/sink/hdfs/HDFSEventSink.java#L429

      This has multiple consequences:

      Attachments

        Issue Links

          Activity

            People

              denes Denes Arvay
              denes Denes Arvay
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: