Flume
  1. Flume
  2. FLUME-985

All HDFS Operations in HDFSEventSink should have a timeout

    Details

    • Type: Improvement Improvement
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: v1.0.0
    • Fix Version/s: v1.2.0
    • Component/s: Sinks+Sources
    • Labels:
      None

      Description

      In FLUME-871 appends were made asynchronous so we could time them out. All HDFS Operations should be done this same way.

      1. FLUME-985-1.patch
        38 kB
        Brock Noland
      2. FLUME-985-0.patch
        34 kB
        Brock Noland

        Activity

        Hide
        Brock Noland added a comment -

        attaching current patch.

        Show
        Brock Noland added a comment - attaching current patch.
        Hide
        Brock Noland added a comment -

        Marking "Patch Available"

        Show
        Brock Noland added a comment - Marking "Patch Available"
        Hide
        jiraposter@reviews.apache.org added a comment -

        -----------------------------------------------------------
        This is an automatically generated e-mail. To reply, visit:
        https://reviews.apache.org/r/3988/
        -----------------------------------------------------------

        Review request for Flume.

        Summary
        -------

        1) All HDFS actions are now done in async mode
        2) If an HDFS append timesout, the file is closed and reopened.
        3) Batching is now handled by BucketWriter which was always aware of the batch size.

        This addresses bug FLUME-985.
        https://issues.apache.org/jira/browse/FLUME-985

        Diffs


        flume-ng-sinks/flume-hdfs-sink/src/main/java/org/apache/flume/sink/hdfs/HDFSSequenceFile.java 19b2559
        flume-ng-sinks/flume-hdfs-sink/src/test/java/org/apache/flume/sink/hdfs/HDFSBadSeqWriter.java 8a6740f
        flume-ng-sinks/flume-hdfs-sink/src/test/java/org/apache/flume/sink/hdfs/TestHDFSEventSink.java 7d8ee8a
        flume-ng-sinks/flume-hdfs-sink/pom.xml f27851e
        flume-ng-sinks/flume-hdfs-sink/src/main/java/org/apache/flume/sink/hdfs/BucketWriter.java 45769f6
        flume-ng-sinks/flume-hdfs-sink/src/main/java/org/apache/flume/sink/hdfs/HDFSEventSink.java 3da90a5

        Diff: https://reviews.apache.org/r/3988/diff

        Testing
        -------

        1) Unit tests were added for close/reopen scenario.
        2) All unit tests pass
        3) I manually verified this patch improved FlumeNG's behavior when the datanode it's writing to is restarted. In the past FlumeNG had to be restarted, now Flume moves on and starts writing to a new file.

        Thanks,

        Brock

        Show
        jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/3988/ ----------------------------------------------------------- Review request for Flume. Summary ------- 1) All HDFS actions are now done in async mode 2) If an HDFS append timesout, the file is closed and reopened. 3) Batching is now handled by BucketWriter which was always aware of the batch size. This addresses bug FLUME-985 . https://issues.apache.org/jira/browse/FLUME-985 Diffs flume-ng-sinks/flume-hdfs-sink/src/main/java/org/apache/flume/sink/hdfs/HDFSSequenceFile.java 19b2559 flume-ng-sinks/flume-hdfs-sink/src/test/java/org/apache/flume/sink/hdfs/HDFSBadSeqWriter.java 8a6740f flume-ng-sinks/flume-hdfs-sink/src/test/java/org/apache/flume/sink/hdfs/TestHDFSEventSink.java 7d8ee8a flume-ng-sinks/flume-hdfs-sink/pom.xml f27851e flume-ng-sinks/flume-hdfs-sink/src/main/java/org/apache/flume/sink/hdfs/BucketWriter.java 45769f6 flume-ng-sinks/flume-hdfs-sink/src/main/java/org/apache/flume/sink/hdfs/HDFSEventSink.java 3da90a5 Diff: https://reviews.apache.org/r/3988/diff Testing ------- 1) Unit tests were added for close/reopen scenario. 2) All unit tests pass 3) I manually verified this patch improved FlumeNG's behavior when the datanode it's writing to is restarted. In the past FlumeNG had to be restarted, now Flume moves on and starts writing to a new file. Thanks, Brock
        Hide
        jiraposter@reviews.apache.org added a comment -

        -----------------------------------------------------------
        This is an automatically generated e-mail. To reply, visit:
        https://reviews.apache.org/r/3988/#review6220
        -----------------------------------------------------------

        Ship it!

        sorry I didn't look at this earlier.
        Looks fine to me. Please see if the code needs to be rebased.

        • Prasad

        On 2012-02-21 21:51:32, Brock Noland wrote:

        -----------------------------------------------------------

        This is an automatically generated e-mail. To reply, visit:

        https://reviews.apache.org/r/3988/

        -----------------------------------------------------------

        (Updated 2012-02-21 21:51:32)

        Review request for Flume.

        Summary

        -------

        1) All HDFS actions are now done in async mode

        2) If an HDFS append timesout, the file is closed and reopened.

        3) Batching is now handled by BucketWriter which was always aware of the batch size.

        This addresses bug FLUME-985.

        https://issues.apache.org/jira/browse/FLUME-985

        Diffs

        -----

        flume-ng-sinks/flume-hdfs-sink/src/main/java/org/apache/flume/sink/hdfs/HDFSSequenceFile.java 19b2559

        flume-ng-sinks/flume-hdfs-sink/src/test/java/org/apache/flume/sink/hdfs/HDFSBadSeqWriter.java 8a6740f

        flume-ng-sinks/flume-hdfs-sink/src/test/java/org/apache/flume/sink/hdfs/TestHDFSEventSink.java 7d8ee8a

        flume-ng-sinks/flume-hdfs-sink/pom.xml f27851e

        flume-ng-sinks/flume-hdfs-sink/src/main/java/org/apache/flume/sink/hdfs/BucketWriter.java 45769f6

        flume-ng-sinks/flume-hdfs-sink/src/main/java/org/apache/flume/sink/hdfs/HDFSEventSink.java 3da90a5

        Diff: https://reviews.apache.org/r/3988/diff

        Testing

        -------

        1) Unit tests were added for close/reopen scenario.

        2) All unit tests pass

        3) I manually verified this patch improved FlumeNG's behavior when the datanode it's writing to is restarted. In the past FlumeNG had to be restarted, now Flume moves on and starts writing to a new file.

        Thanks,

        Brock

        Show
        jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/3988/#review6220 ----------------------------------------------------------- Ship it! sorry I didn't look at this earlier. Looks fine to me. Please see if the code needs to be rebased. Prasad On 2012-02-21 21:51:32, Brock Noland wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/3988/ ----------------------------------------------------------- (Updated 2012-02-21 21:51:32) Review request for Flume. Summary ------- 1) All HDFS actions are now done in async mode 2) If an HDFS append timesout, the file is closed and reopened. 3) Batching is now handled by BucketWriter which was always aware of the batch size. This addresses bug FLUME-985 . https://issues.apache.org/jira/browse/FLUME-985 Diffs ----- flume-ng-sinks/flume-hdfs-sink/src/main/java/org/apache/flume/sink/hdfs/HDFSSequenceFile.java 19b2559 flume-ng-sinks/flume-hdfs-sink/src/test/java/org/apache/flume/sink/hdfs/HDFSBadSeqWriter.java 8a6740f flume-ng-sinks/flume-hdfs-sink/src/test/java/org/apache/flume/sink/hdfs/TestHDFSEventSink.java 7d8ee8a flume-ng-sinks/flume-hdfs-sink/pom.xml f27851e flume-ng-sinks/flume-hdfs-sink/src/main/java/org/apache/flume/sink/hdfs/BucketWriter.java 45769f6 flume-ng-sinks/flume-hdfs-sink/src/main/java/org/apache/flume/sink/hdfs/HDFSEventSink.java 3da90a5 Diff: https://reviews.apache.org/r/3988/diff Testing ------- 1) Unit tests were added for close/reopen scenario. 2) All unit tests pass 3) I manually verified this patch improved FlumeNG's behavior when the datanode it's writing to is restarted. In the past FlumeNG had to be restarted, now Flume moves on and starts writing to a new file. Thanks, Brock
        Hide
        Brock Noland added a comment -

        Rebased patch is attached.

        Show
        Brock Noland added a comment - Rebased patch is attached.
        Hide
        jiraposter@reviews.apache.org added a comment -

        -----------------------------------------------------------
        This is an automatically generated e-mail. To reply, visit:
        https://reviews.apache.org/r/3988/
        -----------------------------------------------------------

        (Updated 2012-03-23 20:55:21.762184)

        Review request for Flume.

        Changes
        -------

        Rebased patch attached. Attaching to JIRA for commit.

        Summary
        -------

        1) All HDFS actions are now done in async mode
        2) If an HDFS append timesout, the file is closed and reopened.
        3) Batching is now handled by BucketWriter which was always aware of the batch size.

        This addresses bug FLUME-985.
        https://issues.apache.org/jira/browse/FLUME-985

        Diffs (updated)


        flume-ng-sinks/flume-hdfs-sink/pom.xml bef2ca7
        flume-ng-sinks/flume-hdfs-sink/src/main/java/org/apache/flume/sink/hdfs/BucketWriter.java 45769f6
        flume-ng-sinks/flume-hdfs-sink/src/main/java/org/apache/flume/sink/hdfs/HDFSEventSink.java 1fdaddd
        flume-ng-sinks/flume-hdfs-sink/src/main/java/org/apache/flume/sink/hdfs/HDFSSequenceFile.java 19b2559
        flume-ng-sinks/flume-hdfs-sink/src/test/java/org/apache/flume/sink/hdfs/HDFSBadSeqWriter.java 8a6740f
        flume-ng-sinks/flume-hdfs-sink/src/test/java/org/apache/flume/sink/hdfs/HDFSBadWriterFactory.java b067c00
        flume-ng-sinks/flume-hdfs-sink/src/test/java/org/apache/flume/sink/hdfs/TestHDFSEventSink.java 8fa72a1

        Diff: https://reviews.apache.org/r/3988/diff

        Testing
        -------

        1) Unit tests were added for close/reopen scenario.
        2) All unit tests pass
        3) I manually verified this patch improved FlumeNG's behavior when the datanode it's writing to is restarted. In the past FlumeNG had to be restarted, now Flume moves on and starts writing to a new file.

        Thanks,

        Brock

        Show
        jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/3988/ ----------------------------------------------------------- (Updated 2012-03-23 20:55:21.762184) Review request for Flume. Changes ------- Rebased patch attached. Attaching to JIRA for commit. Summary ------- 1) All HDFS actions are now done in async mode 2) If an HDFS append timesout, the file is closed and reopened. 3) Batching is now handled by BucketWriter which was always aware of the batch size. This addresses bug FLUME-985 . https://issues.apache.org/jira/browse/FLUME-985 Diffs (updated) flume-ng-sinks/flume-hdfs-sink/pom.xml bef2ca7 flume-ng-sinks/flume-hdfs-sink/src/main/java/org/apache/flume/sink/hdfs/BucketWriter.java 45769f6 flume-ng-sinks/flume-hdfs-sink/src/main/java/org/apache/flume/sink/hdfs/HDFSEventSink.java 1fdaddd flume-ng-sinks/flume-hdfs-sink/src/main/java/org/apache/flume/sink/hdfs/HDFSSequenceFile.java 19b2559 flume-ng-sinks/flume-hdfs-sink/src/test/java/org/apache/flume/sink/hdfs/HDFSBadSeqWriter.java 8a6740f flume-ng-sinks/flume-hdfs-sink/src/test/java/org/apache/flume/sink/hdfs/HDFSBadWriterFactory.java b067c00 flume-ng-sinks/flume-hdfs-sink/src/test/java/org/apache/flume/sink/hdfs/TestHDFSEventSink.java 8fa72a1 Diff: https://reviews.apache.org/r/3988/diff Testing ------- 1) Unit tests were added for close/reopen scenario. 2) All unit tests pass 3) I manually verified this patch improved FlumeNG's behavior when the datanode it's writing to is restarted. In the past FlumeNG had to be restarted, now Flume moves on and starts writing to a new file. Thanks, Brock
        Hide
        Arvind Prabhakar added a comment -

        Patch committed. Thanks Brock!

        Show
        Arvind Prabhakar added a comment - Patch committed. Thanks Brock!
        Hide
        jiraposter@reviews.apache.org added a comment -

        -----------------------------------------------------------
        This is an automatically generated e-mail. To reply, visit:
        https://reviews.apache.org/r/3988/#review6311
        -----------------------------------------------------------

        Ship it!

        +1

        • Arvind

        On 2012-03-23 20:55:21, Brock Noland wrote:

        -----------------------------------------------------------

        This is an automatically generated e-mail. To reply, visit:

        https://reviews.apache.org/r/3988/

        -----------------------------------------------------------

        (Updated 2012-03-23 20:55:21)

        Review request for Flume.

        Summary

        -------

        1) All HDFS actions are now done in async mode

        2) If an HDFS append timesout, the file is closed and reopened.

        3) Batching is now handled by BucketWriter which was always aware of the batch size.

        This addresses bug FLUME-985.

        https://issues.apache.org/jira/browse/FLUME-985

        Diffs

        -----

        flume-ng-sinks/flume-hdfs-sink/pom.xml bef2ca7

        flume-ng-sinks/flume-hdfs-sink/src/main/java/org/apache/flume/sink/hdfs/BucketWriter.java 45769f6

        flume-ng-sinks/flume-hdfs-sink/src/main/java/org/apache/flume/sink/hdfs/HDFSEventSink.java 1fdaddd

        flume-ng-sinks/flume-hdfs-sink/src/main/java/org/apache/flume/sink/hdfs/HDFSSequenceFile.java 19b2559

        flume-ng-sinks/flume-hdfs-sink/src/test/java/org/apache/flume/sink/hdfs/HDFSBadSeqWriter.java 8a6740f

        flume-ng-sinks/flume-hdfs-sink/src/test/java/org/apache/flume/sink/hdfs/HDFSBadWriterFactory.java b067c00

        flume-ng-sinks/flume-hdfs-sink/src/test/java/org/apache/flume/sink/hdfs/TestHDFSEventSink.java 8fa72a1

        Diff: https://reviews.apache.org/r/3988/diff

        Testing

        -------

        1) Unit tests were added for close/reopen scenario.

        2) All unit tests pass

        3) I manually verified this patch improved FlumeNG's behavior when the datanode it's writing to is restarted. In the past FlumeNG had to be restarted, now Flume moves on and starts writing to a new file.

        Thanks,

        Brock

        Show
        jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/3988/#review6311 ----------------------------------------------------------- Ship it! +1 Arvind On 2012-03-23 20:55:21, Brock Noland wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/3988/ ----------------------------------------------------------- (Updated 2012-03-23 20:55:21) Review request for Flume. Summary ------- 1) All HDFS actions are now done in async mode 2) If an HDFS append timesout, the file is closed and reopened. 3) Batching is now handled by BucketWriter which was always aware of the batch size. This addresses bug FLUME-985 . https://issues.apache.org/jira/browse/FLUME-985 Diffs ----- flume-ng-sinks/flume-hdfs-sink/pom.xml bef2ca7 flume-ng-sinks/flume-hdfs-sink/src/main/java/org/apache/flume/sink/hdfs/BucketWriter.java 45769f6 flume-ng-sinks/flume-hdfs-sink/src/main/java/org/apache/flume/sink/hdfs/HDFSEventSink.java 1fdaddd flume-ng-sinks/flume-hdfs-sink/src/main/java/org/apache/flume/sink/hdfs/HDFSSequenceFile.java 19b2559 flume-ng-sinks/flume-hdfs-sink/src/test/java/org/apache/flume/sink/hdfs/HDFSBadSeqWriter.java 8a6740f flume-ng-sinks/flume-hdfs-sink/src/test/java/org/apache/flume/sink/hdfs/HDFSBadWriterFactory.java b067c00 flume-ng-sinks/flume-hdfs-sink/src/test/java/org/apache/flume/sink/hdfs/TestHDFSEventSink.java 8fa72a1 Diff: https://reviews.apache.org/r/3988/diff Testing ------- 1) Unit tests were added for close/reopen scenario. 2) All unit tests pass 3) I manually verified this patch improved FlumeNG's behavior when the datanode it's writing to is restarted. In the past FlumeNG had to be restarted, now Flume moves on and starts writing to a new file. Thanks, Brock
        Hide
        Brock Noland added a comment -

        Thanks!

        Show
        Brock Noland added a comment - Thanks!
        Hide
        Hudson added a comment -

        Integrated in flume-trunk #143 (See https://builds.apache.org/job/flume-trunk/143/)
        FLUME-985. All HDFS Operations should have a timeout.

        (Brock Noland via Arvind Prabhakar) (Revision 1304600)

        Result = SUCCESS
        arvind : http://svn.apache.org/viewvc/?view=rev&rev=1304600
        Files :

        • /incubator/flume/trunk/flume-ng-sinks/flume-hdfs-sink/pom.xml
        • /incubator/flume/trunk/flume-ng-sinks/flume-hdfs-sink/src/main/java/org/apache/flume/sink/hdfs/BucketWriter.java
        • /incubator/flume/trunk/flume-ng-sinks/flume-hdfs-sink/src/main/java/org/apache/flume/sink/hdfs/HDFSEventSink.java
        • /incubator/flume/trunk/flume-ng-sinks/flume-hdfs-sink/src/main/java/org/apache/flume/sink/hdfs/HDFSSequenceFile.java
        • /incubator/flume/trunk/flume-ng-sinks/flume-hdfs-sink/src/test/java/org/apache/flume/sink/hdfs/HDFSBadSeqWriter.java
        • /incubator/flume/trunk/flume-ng-sinks/flume-hdfs-sink/src/test/java/org/apache/flume/sink/hdfs/HDFSBadWriterFactory.java
        • /incubator/flume/trunk/flume-ng-sinks/flume-hdfs-sink/src/test/java/org/apache/flume/sink/hdfs/TestHDFSEventSink.java
        Show
        Hudson added a comment - Integrated in flume-trunk #143 (See https://builds.apache.org/job/flume-trunk/143/ ) FLUME-985 . All HDFS Operations should have a timeout. (Brock Noland via Arvind Prabhakar) (Revision 1304600) Result = SUCCESS arvind : http://svn.apache.org/viewvc/?view=rev&rev=1304600 Files : /incubator/flume/trunk/flume-ng-sinks/flume-hdfs-sink/pom.xml /incubator/flume/trunk/flume-ng-sinks/flume-hdfs-sink/src/main/java/org/apache/flume/sink/hdfs/BucketWriter.java /incubator/flume/trunk/flume-ng-sinks/flume-hdfs-sink/src/main/java/org/apache/flume/sink/hdfs/HDFSEventSink.java /incubator/flume/trunk/flume-ng-sinks/flume-hdfs-sink/src/main/java/org/apache/flume/sink/hdfs/HDFSSequenceFile.java /incubator/flume/trunk/flume-ng-sinks/flume-hdfs-sink/src/test/java/org/apache/flume/sink/hdfs/HDFSBadSeqWriter.java /incubator/flume/trunk/flume-ng-sinks/flume-hdfs-sink/src/test/java/org/apache/flume/sink/hdfs/HDFSBadWriterFactory.java /incubator/flume/trunk/flume-ng-sinks/flume-hdfs-sink/src/test/java/org/apache/flume/sink/hdfs/TestHDFSEventSink.java

          People

          • Assignee:
            Brock Noland
            Reporter:
            Brock Noland
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development