Flume
  1. Flume
  2. FLUME-1295

RollingFileSink need to be able to construct directory path based on escape sequence

    Details

    • Type: Bug Bug
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: v1.2.0
    • Fix Version/s: None
    • Component/s: Sinks+Sources
    • Labels:
      None

      Description

      Currently HDFSEventSink has an ability to construct the directory path based on escape sequence. It would be beneficial to have similar patch for RollingFileSink for people who just wants to use regular file system.

      1. FLUME-1295-B.patch
        53 kB
        Ted Malaska
      2. FLUME-1295-A.patch
        47 kB
        Ted Malaska

        Issue Links

          Activity

          Hide
          Ted Malaska added a comment -

          Status:

          Reviewed HDFSEventSink with respect to "escape sequence" and RollingFileSink.

          Here are some of the things I found:
          1. Looks like we will be able to reuse BucketPath, which is good.
          2. HDFSEventSink has a Object called BucketWriter. One of the major things this BucketWriter does is manage the rollover lifecycle of the file.
          3. RollingFileSink handles all the rollover lifecycle it self.

          My conclusions:
          1. To complete 1295 the RollingFileSink will need to have something link a BucketWriter.
          2. I would hate to make two bucket writers. So I will connect with the Flume brain trust and see if we can build a bucketWriter that can be extended to handle HDFS and normal filesystems.

          Let me know what you think

          Show
          Ted Malaska added a comment - Status: Reviewed HDFSEventSink with respect to "escape sequence" and RollingFileSink. Here are some of the things I found: 1. Looks like we will be able to reuse BucketPath, which is good. 2. HDFSEventSink has a Object called BucketWriter. One of the major things this BucketWriter does is manage the rollover lifecycle of the file. 3. RollingFileSink handles all the rollover lifecycle it self. My conclusions: 1. To complete 1295 the RollingFileSink will need to have something link a BucketWriter. 2. I would hate to make two bucket writers. So I will connect with the Flume brain trust and see if we can build a bucketWriter that can be extended to handle HDFS and normal filesystems. Let me know what you think
          Hide
          Ted Malaska added a comment -

          Status: Initial Plan

          Take BucketWriter and separate it by functionality: lifecycle logic and HDFS IO. Then make the following files.

          AbstractBucketWriter : all the lifecycle logic
          HDFSBucketWriter: HDFS IO
          FileBicketWriter: File IO

          Actions that I will be taking.
          > Add AbstractBucketWriter in org.apache.flume.sink in flume-ng-core
          > Update BucketWriter to extend abstractBucketWriter
          > Add FileBucketWriter in org.apache.flume.sink in flume-ng-core
          > Rename BucketWriter to HDFSBucketWriter

          Show
          Ted Malaska added a comment - Status: Initial Plan Take BucketWriter and separate it by functionality: lifecycle logic and HDFS IO. Then make the following files. AbstractBucketWriter : all the lifecycle logic HDFSBucketWriter: HDFS IO FileBicketWriter: File IO Actions that I will be taking. > Add AbstractBucketWriter in org.apache.flume.sink in flume-ng-core > Update BucketWriter to extend abstractBucketWriter > Add FileBucketWriter in org.apache.flume.sink in flume-ng-core > Rename BucketWriter to HDFSBucketWriter
          Hide
          Ted Malaska added a comment -

          Just to update:

          It's taking a longer then I hoped, but I just wanted to let you know I've got the code compiling and passing all the existing unit tests.

          Tomorrow I have to make more unit tests to test the new functionality in RollingFileSink.

          Show
          Ted Malaska added a comment - Just to update: It's taking a longer then I hoped, but I just wanted to let you know I've got the code compiling and passing all the existing unit tests. Tomorrow I have to make more unit tests to test the new functionality in RollingFileSink.
          Hide
          Ted Malaska added a comment -

          By no means is this done. But it does do the following:

          1. Compiles
          2. Passes all existing unit tests
          3. Passes new unit tests that test different types of roll overs and escape sequences

          The code still has a couple of revisions left before it will be ready for review. But I wanted to post it to show progress.

          Show
          Ted Malaska added a comment - By no means is this done. But it does do the following: 1. Compiles 2. Passes all existing unit tests 3. Passes new unit tests that test different types of roll overs and escape sequences The code still has a couple of revisions left before it will be ready for review. But I wanted to post it to show progress.
          Hide
          Ted Malaska added a comment -

          Not ready for final submission but is ready for review to start.

          Show
          Ted Malaska added a comment - Not ready for final submission but is ready for review to start.
          Hide
          Josh West added a comment - - edited

          I'm very interested in this feature. Would be nice to have the same dynamic path construction for log files going to local filesystems, the same as we have with HDFS.

          I've tried to apply the first patch to branches "flume-1.3.0" and "branch-1.2.0" but it fails:

          $ patch -p0 < ~/FLUME-1295-A.patch
          patching file flume-ng-core/src/main/java/org/apache/flume/sink/AbstractBucketWriter.java
          patching file flume-ng-core/src/main/java/org/apache/flume/sink/FileBucketWriter.java
          patching file flume-ng-core/src/main/java/org/apache/flume/sink/RollingFileSink.java
          Hunk #2 FAILED at 33.
          Hunk #3 FAILED at 84.
          Hunk #4 FAILED at 102.
          Hunk #5 FAILED at 220.
          Hunk #6 succeeded at 246 (offset -22 lines).
          4 out of 6 hunks FAILED – saving rejects to file flume-ng-core/src/main/java/org/apache/flume/sink/RollingFileSink.java.rej
          patching file flume-ng-core/src/test/java/org/apache/flume/sink/TestRollingFileSink.java
          Hunk #3 succeeded at 103 (offset -1 lines).
          Hunk #4 succeeded at 176 (offset -3 lines).
          patching file flume-ng-sinks/flume-hdfs-sink/src/main/java/org/apache/flume/sink/hdfs/BucketWriter.java
          Hunk #2 succeeded at 46 (offset -2 lines).
          Hunk #3 succeeded at 79 (offset -2 lines).
          Hunk #4 FAILED at 102.
          1 out of 4 hunks FAILED – saving rejects to file flume-ng-sinks/flume-hdfs-sink/src/main/java/org/apache/flume/sink/hdfs/BucketWriter.java.rej
          patching file flume-ng-sinks/flume-hdfs-sink/src/main/java/org/apache/flume/sink/hdfs/HDFSEventSink.java
          Hunk #1 succeeded at 45 (offset -1 lines).
          Hunk #2 succeeded at 96 (offset -1 lines).
          Hunk #3 succeeded at 127 (offset -2 lines).
          Hunk #4 succeeded at 472 (offset -3 lines).

          Do you have any advice on how to get this patch applied?

          Thanks.

          Show
          Josh West added a comment - - edited I'm very interested in this feature. Would be nice to have the same dynamic path construction for log files going to local filesystems, the same as we have with HDFS. I've tried to apply the first patch to branches "flume-1.3.0" and "branch-1.2.0" but it fails: $ patch -p0 < ~/ FLUME-1295 -A.patch patching file flume-ng-core/src/main/java/org/apache/flume/sink/AbstractBucketWriter.java patching file flume-ng-core/src/main/java/org/apache/flume/sink/FileBucketWriter.java patching file flume-ng-core/src/main/java/org/apache/flume/sink/RollingFileSink.java Hunk #2 FAILED at 33. Hunk #3 FAILED at 84. Hunk #4 FAILED at 102. Hunk #5 FAILED at 220. Hunk #6 succeeded at 246 (offset -22 lines). 4 out of 6 hunks FAILED – saving rejects to file flume-ng-core/src/main/java/org/apache/flume/sink/RollingFileSink.java.rej patching file flume-ng-core/src/test/java/org/apache/flume/sink/TestRollingFileSink.java Hunk #3 succeeded at 103 (offset -1 lines). Hunk #4 succeeded at 176 (offset -3 lines). patching file flume-ng-sinks/flume-hdfs-sink/src/main/java/org/apache/flume/sink/hdfs/BucketWriter.java Hunk #2 succeeded at 46 (offset -2 lines). Hunk #3 succeeded at 79 (offset -2 lines). Hunk #4 FAILED at 102. 1 out of 4 hunks FAILED – saving rejects to file flume-ng-sinks/flume-hdfs-sink/src/main/java/org/apache/flume/sink/hdfs/BucketWriter.java.rej patching file flume-ng-sinks/flume-hdfs-sink/src/main/java/org/apache/flume/sink/hdfs/HDFSEventSink.java Hunk #1 succeeded at 45 (offset -1 lines). Hunk #2 succeeded at 96 (offset -1 lines). Hunk #3 succeeded at 127 (offset -2 lines). Hunk #4 succeeded at 472 (offset -3 lines). Do you have any advice on how to get this patch applied? Thanks.
          Hide
          Brock Noland added a comment -

          Looks like it has to be re-based.... Personally what I think we should do with this patch is not modify HDFSEventSink but get all the "infrastructure" ready so we could modify HDFSEventSink once we are sure the new code is working correctly. The reason is HDFSEventSink is used by a ton of people and it's been a source of bugs in the past.

          Show
          Brock Noland added a comment - Looks like it has to be re-based.... Personally what I think we should do with this patch is not modify HDFSEventSink but get all the "infrastructure" ready so we could modify HDFSEventSink once we are sure the new code is working correctly. The reason is HDFSEventSink is used by a ton of people and it's been a source of bugs in the past.
          Hide
          Arron Green added a comment -

          Is this something that is being looked into? I am also very interested in having this feature rolled into the RollingFileSink

          Show
          Arron Green added a comment - Is this something that is being looked into? I am also very interested in having this feature rolled into the RollingFileSink
          Hide
          Ted Malaska added a comment -

          Hey Guys,

          Just reviewing old jiras. Is this still needed. If so I will try to rebase it and follow Brock's advice.

          Ted Malaska

          Show
          Ted Malaska added a comment - Hey Guys, Just reviewing old jiras. Is this still needed. If so I will try to rebase it and follow Brock's advice. Ted Malaska
          Hide
          ryan peterson added a comment -

          I am definitely interested in this.

          We've hacked together something in an effort to deprecate scribe from our environment, but it would be great to see this included in flume-ng core.

          Show
          ryan peterson added a comment - I am definitely interested in this. We've hacked together something in an effort to deprecate scribe from our environment, but it would be great to see this included in flume-ng core.
          Hide
          Arron Green added a comment -

          I'm still interested in this as well

          Show
          Arron Green added a comment - I'm still interested in this as well
          Hide
          Ted Malaska added a comment -

          OK I'm working on 2007 and I will back track to this jira after I finish that one.

          Show
          Ted Malaska added a comment - OK I'm working on 2007 and I will back track to this jira after I finish that one.
          Hide
          Hari Shreedharan added a comment -

          Hey Ted,

          Thanks for picking this one up. You might want to take a look at BucketPath in flume-ng-core. That class does much of the implementation. It should be fairly easy to just do what is being done in the HDFS sink, just that we should make sure there are enough tests for it

          Show
          Hari Shreedharan added a comment - Hey Ted, Thanks for picking this one up. You might want to take a look at BucketPath in flume-ng-core. That class does much of the implementation. It should be fairly easy to just do what is being done in the HDFS sink, just that we should make sure there are enough tests for it

            People

            • Assignee:
              Ted Malaska
              Reporter:
              Bhaskar Marthi
            • Votes:
              6 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

              • Created:
                Updated:

                Development