Uploaded image for project: 'Flume'
  1. Flume
  2. FLUME-3085

HDFS Sink can skip flushing some BucketWriters, might lead to data loss

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: 1.7.0
    • Fix Version/s: 1.8.0
    • Component/s: Sinks+Sources
    • Labels:
      None

      Description

      The HDFSEventSink.process() is already prepared for a rare race condition, namely when the BucketWriter acquired in line 389 gets closed by an other thread (e.g. because the idleTimeout or the rollInterval) before the append() is called in line 406.
      If this is the case the BucketWriter.append() call throws a BucketClosedException and the sink creates a new BucketWriter instance and appends to it.
      But this newly created instance won't be added to the writers list, which means that it won't be flushed after the processing loop finished: https://github.com/apache/flume/blob/trunk/flume-ng-sinks/flume-hdfs-sink/src/main/java/org/apache/flume/sink/hdfs/HDFSEventSink.java#L429

      This has multiple consequences:

        Issue Links

          Activity

          Hide
          hudson Hudson added a comment -

          UNSTABLE: Integrated in Jenkins build Flume-trunk-hbase-1 #247 (See https://builds.apache.org/job/Flume-trunk-hbase-1/247/)
          FLUME-3085. HDFS Sink can skip flushing some BucketWriters, might lead (mpercy: http://git-wip-us.apache.org/repos/asf/flume/repo?p=flume.git&a=commit&h=ed433ae1b12d40117ca3dca2ffd57389984ede72)

          • (edit) flume-ng-sinks/flume-hdfs-sink/src/main/java/org/apache/flume/sink/hdfs/HDFSEventSink.java
          • (edit) flume-ng-sinks/flume-hdfs-sink/src/test/java/org/apache/flume/sink/hdfs/TestHDFSEventSink.java
          Show
          hudson Hudson added a comment - UNSTABLE: Integrated in Jenkins build Flume-trunk-hbase-1 #247 (See https://builds.apache.org/job/Flume-trunk-hbase-1/247/ ) FLUME-3085 . HDFS Sink can skip flushing some BucketWriters, might lead (mpercy: http://git-wip-us.apache.org/repos/asf/flume/repo?p=flume.git&a=commit&h=ed433ae1b12d40117ca3dca2ffd57389984ede72 ) (edit) flume-ng-sinks/flume-hdfs-sink/src/main/java/org/apache/flume/sink/hdfs/HDFSEventSink.java (edit) flume-ng-sinks/flume-hdfs-sink/src/test/java/org/apache/flume/sink/hdfs/TestHDFSEventSink.java
          Hide
          mpercy Mike Percy added a comment -

          Pushed to trunk.

          Show
          mpercy Mike Percy added a comment - Pushed to trunk.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user asfgit closed the pull request at:

          https://github.com/apache/flume/pull/129

          Show
          githubbot ASF GitHub Bot added a comment - Github user asfgit closed the pull request at: https://github.com/apache/flume/pull/129
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit ed433ae1b12d40117ca3dca2ffd57389984ede72 in flume's branch refs/heads/trunk from Denes Arvay
          [ https://git-wip-us.apache.org/repos/asf?p=flume.git;h=ed433ae ]

          FLUME-3085. HDFS Sink can skip flushing some BucketWriters, might lead to data loss

          This commit fixes the issue when in HDFSEventSink.process() a BucketWriter.append()
          call threw a BucketClosedException then the newly created BucketWriter wasn't
          flushed after the processing loop.

          This closes #129

          Reviewers: Attila Simon, Mike Percy

          (Denes Arvay via Mike Percy)

          Show
          jira-bot ASF subversion and git services added a comment - Commit ed433ae1b12d40117ca3dca2ffd57389984ede72 in flume's branch refs/heads/trunk from Denes Arvay [ https://git-wip-us.apache.org/repos/asf?p=flume.git;h=ed433ae ] FLUME-3085 . HDFS Sink can skip flushing some BucketWriters, might lead to data loss This commit fixes the issue when in HDFSEventSink.process() a BucketWriter.append() call threw a BucketClosedException then the newly created BucketWriter wasn't flushed after the processing loop. This closes #129 Reviewers: Attila Simon, Mike Percy (Denes Arvay via Mike Percy)
          Hide
          githubbot ASF GitHub Bot added a comment -

          GitHub user adenes opened a pull request:

          https://github.com/apache/flume/pull/129

          FLUME-3085: HDFS Sink can skip flushing some BucketWriters, might lead to data loss

          This commit fixes the issue when in `HDFSEventSink.process()` a `BucketWriter.append()` call threw a `BucketClosedException` then the newly created `BucketWriter` wasn't flushed after the processing loop.

          You can merge this pull request into a Git repository by running:

          $ git pull https://github.com/adenes/flume FLUME-3085

          Alternatively you can review and apply these changes as the patch at:

          https://github.com/apache/flume/pull/129.patch

          To close this pull request, make a commit to your master/trunk branch
          with (at least) the following in the commit message:

          This closes #129


          commit f775a629c40bf8373cf3c0a991ea8738e2989c39
          Author: Denes Arvay <denes@cloudera.com>
          Date: 2017-04-20T13:58:47Z

          FLUME-3085: HDFS Sink can skip flushing some BucketWriters, might lead to data loss

          This commit fixes the issue when in HDFSEventSink.process() a BucketWriter.append()
          call threw a BucketClosedException then the newly created BucketWriter wasn't
          flushed after the processing loop.


          Show
          githubbot ASF GitHub Bot added a comment - GitHub user adenes opened a pull request: https://github.com/apache/flume/pull/129 FLUME-3085 : HDFS Sink can skip flushing some BucketWriters, might lead to data loss This commit fixes the issue when in `HDFSEventSink.process()` a `BucketWriter.append()` call threw a `BucketClosedException` then the newly created `BucketWriter` wasn't flushed after the processing loop. You can merge this pull request into a Git repository by running: $ git pull https://github.com/adenes/flume FLUME-3085 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flume/pull/129.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #129 commit f775a629c40bf8373cf3c0a991ea8738e2989c39 Author: Denes Arvay <denes@cloudera.com> Date: 2017-04-20T13:58:47Z FLUME-3085 : HDFS Sink can skip flushing some BucketWriters, might lead to data loss This commit fixes the issue when in HDFSEventSink.process() a BucketWriter.append() call threw a BucketClosedException then the newly created BucketWriter wasn't flushed after the processing loop.

            People

            • Assignee:
              denes Denes Arvay
              Reporter:
              denes Denes Arvay
            • Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development