Flume
  1. Flume
  2. FLUME-896

Implement a Durable Memory Channel

    Details

    • Type: New Feature New Feature
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: NG alpha 1
    • Fix Version/s: v1.2.0
    • Component/s: Channel
    • Labels:
      None

      Description

      Implement a channel that backs Memory Channel with the file system for durable event delivery.

      1. FLUME-896-10.patch
        78 kB
        Brock Noland
      2. FLUME-896-9.patch
        77 kB
        Brock Noland
      3. FLUME-896-8.patch
        158 kB
        Brock Noland
      4. FLUME-896-7.patch
        157 kB
        Brock Noland
      5. FLUME-896-5.patch
        81 kB
        Brock Noland
      6. FLUME-896-4.patch
        74 kB
        Brock Noland
      7. FLUME-896-1.patch
        60 kB
        Brock Noland

        Issue Links

        There are no Sub-Tasks for this issue.

          Activity

          Hide
          Hudson added a comment -

          Integrated in flume-trunk #153 (See https://builds.apache.org/job/flume-trunk/153/)
          FLUME-896. Implement a Durable Memory Channel.

          (Brock Noland via Arvind Prabhakar) (Revision 1308168)

          Result = UNSTABLE
          arvind : http://svn.apache.org/viewvc/?view=rev&rev=1308168
          Files :

          • /incubator/flume/trunk/flume-ng-channels/flume-file-channel/pom.xml
          • /incubator/flume/trunk/flume-ng-channels/flume-recoverable-memory-channel
          • /incubator/flume/trunk/flume-ng-channels/flume-recoverable-memory-channel/pom.xml
          • /incubator/flume/trunk/flume-ng-channels/flume-recoverable-memory-channel/src
          • /incubator/flume/trunk/flume-ng-channels/flume-recoverable-memory-channel/src/main
          • /incubator/flume/trunk/flume-ng-channels/flume-recoverable-memory-channel/src/main/java
          • /incubator/flume/trunk/flume-ng-channels/flume-recoverable-memory-channel/src/main/java/org
          • /incubator/flume/trunk/flume-ng-channels/flume-recoverable-memory-channel/src/main/java/org/apache
          • /incubator/flume/trunk/flume-ng-channels/flume-recoverable-memory-channel/src/main/java/org/apache/flume
          • /incubator/flume/trunk/flume-ng-channels/flume-recoverable-memory-channel/src/main/java/org/apache/flume/channel
          • /incubator/flume/trunk/flume-ng-channels/flume-recoverable-memory-channel/src/main/java/org/apache/flume/channel/recoverable
          • /incubator/flume/trunk/flume-ng-channels/flume-recoverable-memory-channel/src/main/java/org/apache/flume/channel/recoverable/memory
          • /incubator/flume/trunk/flume-ng-channels/flume-recoverable-memory-channel/src/main/java/org/apache/flume/channel/recoverable/memory/RecoverableMemoryChannel.java
          • /incubator/flume/trunk/flume-ng-channels/flume-recoverable-memory-channel/src/main/java/org/apache/flume/channel/recoverable/memory/RecoverableMemoryChannelEvent.java
          • /incubator/flume/trunk/flume-ng-channels/flume-recoverable-memory-channel/src/main/java/org/apache/flume/channel/recoverable/memory/wal
          • /incubator/flume/trunk/flume-ng-channels/flume-recoverable-memory-channel/src/main/java/org/apache/flume/channel/recoverable/memory/wal/SequenceIDBuffer.java
          • /incubator/flume/trunk/flume-ng-channels/flume-recoverable-memory-channel/src/main/java/org/apache/flume/channel/recoverable/memory/wal/WAL.java
          • /incubator/flume/trunk/flume-ng-channels/flume-recoverable-memory-channel/src/main/java/org/apache/flume/channel/recoverable/memory/wal/WALDataFile.java
          • /incubator/flume/trunk/flume-ng-channels/flume-recoverable-memory-channel/src/main/java/org/apache/flume/channel/recoverable/memory/wal/WALEntry.java
          • /incubator/flume/trunk/flume-ng-channels/flume-recoverable-memory-channel/src/main/java/org/apache/flume/channel/recoverable/memory/wal/WALReplayResult.java
          • /incubator/flume/trunk/flume-ng-channels/flume-recoverable-memory-channel/src/test
          • /incubator/flume/trunk/flume-ng-channels/flume-recoverable-memory-channel/src/test/java
          • /incubator/flume/trunk/flume-ng-channels/flume-recoverable-memory-channel/src/test/java/org
          • /incubator/flume/trunk/flume-ng-channels/flume-recoverable-memory-channel/src/test/java/org/apache
          • /incubator/flume/trunk/flume-ng-channels/flume-recoverable-memory-channel/src/test/java/org/apache/flume
          • /incubator/flume/trunk/flume-ng-channels/flume-recoverable-memory-channel/src/test/java/org/apache/flume/channel
          • /incubator/flume/trunk/flume-ng-channels/flume-recoverable-memory-channel/src/test/java/org/apache/flume/channel/recoverable
          • /incubator/flume/trunk/flume-ng-channels/flume-recoverable-memory-channel/src/test/java/org/apache/flume/channel/recoverable/memory
          • /incubator/flume/trunk/flume-ng-channels/flume-recoverable-memory-channel/src/test/java/org/apache/flume/channel/recoverable/memory/TestRecoverableMemoryChannel.java
          • /incubator/flume/trunk/flume-ng-channels/flume-recoverable-memory-channel/src/test/java/org/apache/flume/channel/recoverable/memory/wal
          • /incubator/flume/trunk/flume-ng-channels/flume-recoverable-memory-channel/src/test/java/org/apache/flume/channel/recoverable/memory/wal/TestSequenceIDBuffer.java
          • /incubator/flume/trunk/flume-ng-channels/flume-recoverable-memory-channel/src/test/java/org/apache/flume/channel/recoverable/memory/wal/TestWAL.java
          • /incubator/flume/trunk/flume-ng-channels/flume-recoverable-memory-channel/src/test/resources
          • /incubator/flume/trunk/flume-ng-channels/flume-recoverable-memory-channel/src/test/resources/log4j.properties
          • /incubator/flume/trunk/flume-ng-channels/pom.xml
          • /incubator/flume/trunk/flume-ng-dist/pom.xml
          • /incubator/flume/trunk/pom.xml
          Show
          Hudson added a comment - Integrated in flume-trunk #153 (See https://builds.apache.org/job/flume-trunk/153/ ) FLUME-896 . Implement a Durable Memory Channel. (Brock Noland via Arvind Prabhakar) (Revision 1308168) Result = UNSTABLE arvind : http://svn.apache.org/viewvc/?view=rev&rev=1308168 Files : /incubator/flume/trunk/flume-ng-channels/flume-file-channel/pom.xml /incubator/flume/trunk/flume-ng-channels/flume-recoverable-memory-channel /incubator/flume/trunk/flume-ng-channels/flume-recoverable-memory-channel/pom.xml /incubator/flume/trunk/flume-ng-channels/flume-recoverable-memory-channel/src /incubator/flume/trunk/flume-ng-channels/flume-recoverable-memory-channel/src/main /incubator/flume/trunk/flume-ng-channels/flume-recoverable-memory-channel/src/main/java /incubator/flume/trunk/flume-ng-channels/flume-recoverable-memory-channel/src/main/java/org /incubator/flume/trunk/flume-ng-channels/flume-recoverable-memory-channel/src/main/java/org/apache /incubator/flume/trunk/flume-ng-channels/flume-recoverable-memory-channel/src/main/java/org/apache/flume /incubator/flume/trunk/flume-ng-channels/flume-recoverable-memory-channel/src/main/java/org/apache/flume/channel /incubator/flume/trunk/flume-ng-channels/flume-recoverable-memory-channel/src/main/java/org/apache/flume/channel/recoverable /incubator/flume/trunk/flume-ng-channels/flume-recoverable-memory-channel/src/main/java/org/apache/flume/channel/recoverable/memory /incubator/flume/trunk/flume-ng-channels/flume-recoverable-memory-channel/src/main/java/org/apache/flume/channel/recoverable/memory/RecoverableMemoryChannel.java /incubator/flume/trunk/flume-ng-channels/flume-recoverable-memory-channel/src/main/java/org/apache/flume/channel/recoverable/memory/RecoverableMemoryChannelEvent.java /incubator/flume/trunk/flume-ng-channels/flume-recoverable-memory-channel/src/main/java/org/apache/flume/channel/recoverable/memory/wal /incubator/flume/trunk/flume-ng-channels/flume-recoverable-memory-channel/src/main/java/org/apache/flume/channel/recoverable/memory/wal/SequenceIDBuffer.java /incubator/flume/trunk/flume-ng-channels/flume-recoverable-memory-channel/src/main/java/org/apache/flume/channel/recoverable/memory/wal/WAL.java /incubator/flume/trunk/flume-ng-channels/flume-recoverable-memory-channel/src/main/java/org/apache/flume/channel/recoverable/memory/wal/WALDataFile.java /incubator/flume/trunk/flume-ng-channels/flume-recoverable-memory-channel/src/main/java/org/apache/flume/channel/recoverable/memory/wal/WALEntry.java /incubator/flume/trunk/flume-ng-channels/flume-recoverable-memory-channel/src/main/java/org/apache/flume/channel/recoverable/memory/wal/WALReplayResult.java /incubator/flume/trunk/flume-ng-channels/flume-recoverable-memory-channel/src/test /incubator/flume/trunk/flume-ng-channels/flume-recoverable-memory-channel/src/test/java /incubator/flume/trunk/flume-ng-channels/flume-recoverable-memory-channel/src/test/java/org /incubator/flume/trunk/flume-ng-channels/flume-recoverable-memory-channel/src/test/java/org/apache /incubator/flume/trunk/flume-ng-channels/flume-recoverable-memory-channel/src/test/java/org/apache/flume /incubator/flume/trunk/flume-ng-channels/flume-recoverable-memory-channel/src/test/java/org/apache/flume/channel /incubator/flume/trunk/flume-ng-channels/flume-recoverable-memory-channel/src/test/java/org/apache/flume/channel/recoverable /incubator/flume/trunk/flume-ng-channels/flume-recoverable-memory-channel/src/test/java/org/apache/flume/channel/recoverable/memory /incubator/flume/trunk/flume-ng-channels/flume-recoverable-memory-channel/src/test/java/org/apache/flume/channel/recoverable/memory/TestRecoverableMemoryChannel.java /incubator/flume/trunk/flume-ng-channels/flume-recoverable-memory-channel/src/test/java/org/apache/flume/channel/recoverable/memory/wal /incubator/flume/trunk/flume-ng-channels/flume-recoverable-memory-channel/src/test/java/org/apache/flume/channel/recoverable/memory/wal/TestSequenceIDBuffer.java /incubator/flume/trunk/flume-ng-channels/flume-recoverable-memory-channel/src/test/java/org/apache/flume/channel/recoverable/memory/wal/TestWAL.java /incubator/flume/trunk/flume-ng-channels/flume-recoverable-memory-channel/src/test/resources /incubator/flume/trunk/flume-ng-channels/flume-recoverable-memory-channel/src/test/resources/log4j.properties /incubator/flume/trunk/flume-ng-channels/pom.xml /incubator/flume/trunk/flume-ng-dist/pom.xml /incubator/flume/trunk/pom.xml
          Hide
          Arvind Prabhakar added a comment -

          Patch committed. Thanks Brock!

          Show
          Arvind Prabhakar added a comment - Patch committed. Thanks Brock!
          Hide
          jiraposter@reviews.apache.org added a comment -

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/4325/#review6601
          -----------------------------------------------------------

          Ship it!

          +1. Changes look good and all tests pass. Please attach the patch to the Jira.

          • Arvind

          On 2012-03-30 22:13:55, Brock Noland wrote:

          -----------------------------------------------------------

          This is an automatically generated e-mail. To reply, visit:

          https://reviews.apache.org/r/4325/

          -----------------------------------------------------------

          (Updated 2012-03-30 22:13:55)

          Review request for Flume.

          Summary

          -------

          Attached patch not meant for commit. Just posting here for easy review.

          This addresses bug FLUME-896.

          https://issues.apache.org/jira/browse/FLUME-896

          Diffs

          -----

          flume-ng-channels/flume-file-channel/pom.xml 48d1481

          flume-ng-channels/flume-recoverable-memory-channel/pom.xml PRE-CREATION

          flume-ng-channels/flume-recoverable-memory-channel/src/main/java/org/apache/flume/channel/recoverable/memory/RecoverableMemoryChannel.java PRE-CREATION

          flume-ng-channels/flume-recoverable-memory-channel/src/main/java/org/apache/flume/channel/recoverable/memory/RecoverableMemoryChannelEvent.java PRE-CREATION

          flume-ng-channels/flume-recoverable-memory-channel/src/main/java/org/apache/flume/channel/recoverable/memory/wal/SequenceIDBuffer.java PRE-CREATION

          flume-ng-channels/flume-recoverable-memory-channel/src/main/java/org/apache/flume/channel/recoverable/memory/wal/WAL.java PRE-CREATION

          flume-ng-channels/flume-recoverable-memory-channel/src/main/java/org/apache/flume/channel/recoverable/memory/wal/WALDataFile.java PRE-CREATION

          flume-ng-channels/flume-recoverable-memory-channel/src/main/java/org/apache/flume/channel/recoverable/memory/wal/WALEntry.java PRE-CREATION

          flume-ng-channels/flume-recoverable-memory-channel/src/main/java/org/apache/flume/channel/recoverable/memory/wal/WALReplayResult.java PRE-CREATION

          flume-ng-channels/flume-recoverable-memory-channel/src/test/java/org/apache/flume/channel/recoverable/memory/TestRecoverableMemoryChannel.java PRE-CREATION

          flume-ng-channels/flume-recoverable-memory-channel/src/test/java/org/apache/flume/channel/recoverable/memory/wal/TestSequenceIDBuffer.java PRE-CREATION

          flume-ng-channels/flume-recoverable-memory-channel/src/test/java/org/apache/flume/channel/recoverable/memory/wal/TestWAL.java PRE-CREATION

          flume-ng-channels/flume-recoverable-memory-channel/src/test/resources/log4j.properties PRE-CREATION

          flume-ng-channels/pom.xml 0bd8633

          flume-ng-dist/pom.xml 4c49452

          pom.xml c91222f

          Diff: https://reviews.apache.org/r/4325/diff

          Testing

          -------

          Thanks,

          Brock

          Show
          jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4325/#review6601 ----------------------------------------------------------- Ship it! +1. Changes look good and all tests pass. Please attach the patch to the Jira. Arvind On 2012-03-30 22:13:55, Brock Noland wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4325/ ----------------------------------------------------------- (Updated 2012-03-30 22:13:55) Review request for Flume. Summary ------- Attached patch not meant for commit. Just posting here for easy review. This addresses bug FLUME-896 . https://issues.apache.org/jira/browse/FLUME-896 Diffs ----- flume-ng-channels/flume-file-channel/pom.xml 48d1481 flume-ng-channels/flume-recoverable-memory-channel/pom.xml PRE-CREATION flume-ng-channels/flume-recoverable-memory-channel/src/main/java/org/apache/flume/channel/recoverable/memory/RecoverableMemoryChannel.java PRE-CREATION flume-ng-channels/flume-recoverable-memory-channel/src/main/java/org/apache/flume/channel/recoverable/memory/RecoverableMemoryChannelEvent.java PRE-CREATION flume-ng-channels/flume-recoverable-memory-channel/src/main/java/org/apache/flume/channel/recoverable/memory/wal/SequenceIDBuffer.java PRE-CREATION flume-ng-channels/flume-recoverable-memory-channel/src/main/java/org/apache/flume/channel/recoverable/memory/wal/WAL.java PRE-CREATION flume-ng-channels/flume-recoverable-memory-channel/src/main/java/org/apache/flume/channel/recoverable/memory/wal/WALDataFile.java PRE-CREATION flume-ng-channels/flume-recoverable-memory-channel/src/main/java/org/apache/flume/channel/recoverable/memory/wal/WALEntry.java PRE-CREATION flume-ng-channels/flume-recoverable-memory-channel/src/main/java/org/apache/flume/channel/recoverable/memory/wal/WALReplayResult.java PRE-CREATION flume-ng-channels/flume-recoverable-memory-channel/src/test/java/org/apache/flume/channel/recoverable/memory/TestRecoverableMemoryChannel.java PRE-CREATION flume-ng-channels/flume-recoverable-memory-channel/src/test/java/org/apache/flume/channel/recoverable/memory/wal/TestSequenceIDBuffer.java PRE-CREATION flume-ng-channels/flume-recoverable-memory-channel/src/test/java/org/apache/flume/channel/recoverable/memory/wal/TestWAL.java PRE-CREATION flume-ng-channels/flume-recoverable-memory-channel/src/test/resources/log4j.properties PRE-CREATION flume-ng-channels/pom.xml 0bd8633 flume-ng-dist/pom.xml 4c49452 pom.xml c91222f Diff: https://reviews.apache.org/r/4325/diff Testing ------- Thanks, Brock
          Hide
          jiraposter@reviews.apache.org added a comment -

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/4325/
          -----------------------------------------------------------

          (Updated 2012-03-30 22:13:55.453271)

          Review request for Flume.

          Changes
          -------

          Updated patch

          Summary
          -------

          Attached patch not meant for commit. Just posting here for easy review.

          This addresses bug FLUME-896.
          https://issues.apache.org/jira/browse/FLUME-896

          Diffs (updated)


          flume-ng-channels/flume-file-channel/pom.xml 48d1481
          flume-ng-channels/flume-recoverable-memory-channel/pom.xml PRE-CREATION
          flume-ng-channels/flume-recoverable-memory-channel/src/main/java/org/apache/flume/channel/recoverable/memory/RecoverableMemoryChannel.java PRE-CREATION
          flume-ng-channels/flume-recoverable-memory-channel/src/main/java/org/apache/flume/channel/recoverable/memory/RecoverableMemoryChannelEvent.java PRE-CREATION
          flume-ng-channels/flume-recoverable-memory-channel/src/main/java/org/apache/flume/channel/recoverable/memory/wal/SequenceIDBuffer.java PRE-CREATION
          flume-ng-channels/flume-recoverable-memory-channel/src/main/java/org/apache/flume/channel/recoverable/memory/wal/WAL.java PRE-CREATION
          flume-ng-channels/flume-recoverable-memory-channel/src/main/java/org/apache/flume/channel/recoverable/memory/wal/WALDataFile.java PRE-CREATION
          flume-ng-channels/flume-recoverable-memory-channel/src/main/java/org/apache/flume/channel/recoverable/memory/wal/WALEntry.java PRE-CREATION
          flume-ng-channels/flume-recoverable-memory-channel/src/main/java/org/apache/flume/channel/recoverable/memory/wal/WALReplayResult.java PRE-CREATION
          flume-ng-channels/flume-recoverable-memory-channel/src/test/java/org/apache/flume/channel/recoverable/memory/TestRecoverableMemoryChannel.java PRE-CREATION
          flume-ng-channels/flume-recoverable-memory-channel/src/test/java/org/apache/flume/channel/recoverable/memory/wal/TestSequenceIDBuffer.java PRE-CREATION
          flume-ng-channels/flume-recoverable-memory-channel/src/test/java/org/apache/flume/channel/recoverable/memory/wal/TestWAL.java PRE-CREATION
          flume-ng-channels/flume-recoverable-memory-channel/src/test/resources/log4j.properties PRE-CREATION
          flume-ng-channels/pom.xml 0bd8633
          flume-ng-dist/pom.xml 4c49452
          pom.xml c91222f

          Diff: https://reviews.apache.org/r/4325/diff

          Testing
          -------

          Thanks,

          Brock

          Show
          jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4325/ ----------------------------------------------------------- (Updated 2012-03-30 22:13:55.453271) Review request for Flume. Changes ------- Updated patch Summary ------- Attached patch not meant for commit. Just posting here for easy review. This addresses bug FLUME-896 . https://issues.apache.org/jira/browse/FLUME-896 Diffs (updated) flume-ng-channels/flume-file-channel/pom.xml 48d1481 flume-ng-channels/flume-recoverable-memory-channel/pom.xml PRE-CREATION flume-ng-channels/flume-recoverable-memory-channel/src/main/java/org/apache/flume/channel/recoverable/memory/RecoverableMemoryChannel.java PRE-CREATION flume-ng-channels/flume-recoverable-memory-channel/src/main/java/org/apache/flume/channel/recoverable/memory/RecoverableMemoryChannelEvent.java PRE-CREATION flume-ng-channels/flume-recoverable-memory-channel/src/main/java/org/apache/flume/channel/recoverable/memory/wal/SequenceIDBuffer.java PRE-CREATION flume-ng-channels/flume-recoverable-memory-channel/src/main/java/org/apache/flume/channel/recoverable/memory/wal/WAL.java PRE-CREATION flume-ng-channels/flume-recoverable-memory-channel/src/main/java/org/apache/flume/channel/recoverable/memory/wal/WALDataFile.java PRE-CREATION flume-ng-channels/flume-recoverable-memory-channel/src/main/java/org/apache/flume/channel/recoverable/memory/wal/WALEntry.java PRE-CREATION flume-ng-channels/flume-recoverable-memory-channel/src/main/java/org/apache/flume/channel/recoverable/memory/wal/WALReplayResult.java PRE-CREATION flume-ng-channels/flume-recoverable-memory-channel/src/test/java/org/apache/flume/channel/recoverable/memory/TestRecoverableMemoryChannel.java PRE-CREATION flume-ng-channels/flume-recoverable-memory-channel/src/test/java/org/apache/flume/channel/recoverable/memory/wal/TestSequenceIDBuffer.java PRE-CREATION flume-ng-channels/flume-recoverable-memory-channel/src/test/java/org/apache/flume/channel/recoverable/memory/wal/TestWAL.java PRE-CREATION flume-ng-channels/flume-recoverable-memory-channel/src/test/resources/log4j.properties PRE-CREATION flume-ng-channels/pom.xml 0bd8633 flume-ng-dist/pom.xml 4c49452 pom.xml c91222f Diff: https://reviews.apache.org/r/4325/diff Testing ------- Thanks, Brock
          Hide
          Brock Noland added a comment -

          The maven problem is fixed for me, the new module wasn't added to flume-ng-channels/pom.xml. However, org.apache.flume.api.TestFailoverRpcClient fails for me, however that seems test is flaky on my system so I am not sure if it's related.

          Show
          Brock Noland added a comment - The maven problem is fixed for me, the new module wasn't added to flume-ng-channels/pom.xml. However, org.apache.flume.api.TestFailoverRpcClient fails for me, however that seems test is flaky on my system so I am not sure if it's related.
          Hide
          Brock Noland added a comment -

          That patch was bad, but I am still seeing a weird maven problem building a package. Updated patch attached. Still looking...

          Show
          Brock Noland added a comment - That patch was bad, but I am still seeing a weird maven problem building a package. Updated patch attached. Still looking...
          Hide
          Brock Noland added a comment -

          Updated patch is attached.

          Show
          Brock Noland added a comment - Updated patch is attached.
          Hide
          Arvind Prabhakar added a comment -

          I think the patch needs a slight change - the top-level pom should have the following specified in dependencyManagement section:

                <dependency>
                  <groupId>org.apache.flume.flume-ng-channels</groupId>
                  <artifactId>flume-recoverable-memory-channel</artifactId>
                  <version>1.2.0-incubating-SNAPSHOT</version>
                </dependency>
          
          
          Show
          Arvind Prabhakar added a comment - I think the patch needs a slight change - the top-level pom should have the following specified in dependencyManagement section: <dependency> <groupId>org.apache.flume.flume-ng-channels</groupId> <artifactId>flume-recoverable-memory-channel</artifactId> <version>1.2.0-incubating-SNAPSHOT</version> </dependency>
          Hide
          Arvind Prabhakar added a comment -

          I think the patch needs a slight change - the top-level pom should have the following specified in dependencyManagement section:

                <dependency>
                  <groupId>org.apache.flume.flume-ng-channels</groupId>
                  <artifactId>flume-recoverable-memory-channel</artifactId>
                  <version>1.2.0-incubating-SNAPSHOT</version>
                </dependency>
          
          
          Show
          Arvind Prabhakar added a comment - I think the patch needs a slight change - the top-level pom should have the following specified in dependencyManagement section: <dependency> <groupId>org.apache.flume.flume-ng-channels</groupId> <artifactId>flume-recoverable-memory-channel</artifactId> <version>1.2.0-incubating-SNAPSHOT</version> </dependency>
          Hide
          Brock Noland added a comment -

          FLUME-1085 has been created for a true FileChannel.

          Show
          Brock Noland added a comment - FLUME-1085 has been created for a true FileChannel.
          Hide
          Brock Noland added a comment -

          Attached is the patch from RB with the new module included in dist.

          Show
          Brock Noland added a comment - Attached is the patch from RB with the new module included in dist.
          Hide
          jiraposter@reviews.apache.org added a comment -

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/4325/#review6586
          -----------------------------------------------------------

          Ship it!

          Changes look good Brock. Can you please update the dist module so that this gets packaged in the generated distro? Once done, please attach the patch to the Jira.

          • Arvind

          On 2012-03-30 17:43:51, Brock Noland wrote:

          -----------------------------------------------------------

          This is an automatically generated e-mail. To reply, visit:

          https://reviews.apache.org/r/4325/

          -----------------------------------------------------------

          (Updated 2012-03-30 17:43:51)

          Review request for Flume.

          Summary

          -------

          Attached patch not meant for commit. Just posting here for easy review.

          This addresses bug FLUME-896.

          https://issues.apache.org/jira/browse/FLUME-896

          Diffs

          -----

          flume-ng-channels/flume-recoverable-memory-channel/pom.xml PRE-CREATION

          flume-ng-channels/flume-recoverable-memory-channel/src/main/java/org/apache/flume/channel/recoverable/memory/RecoverableMemoryChannel.java PRE-CREATION

          flume-ng-channels/flume-recoverable-memory-channel/src/main/java/org/apache/flume/channel/recoverable/memory/RecoverableMemoryChannelEvent.java PRE-CREATION

          flume-ng-channels/flume-recoverable-memory-channel/src/main/java/org/apache/flume/channel/recoverable/memory/wal/SequenceIDBuffer.java PRE-CREATION

          flume-ng-channels/flume-recoverable-memory-channel/src/main/java/org/apache/flume/channel/recoverable/memory/wal/WAL.java PRE-CREATION

          flume-ng-channels/flume-recoverable-memory-channel/src/main/java/org/apache/flume/channel/recoverable/memory/wal/WALDataFile.java PRE-CREATION

          flume-ng-channels/flume-recoverable-memory-channel/src/main/java/org/apache/flume/channel/recoverable/memory/wal/WALEntry.java PRE-CREATION

          flume-ng-channels/flume-recoverable-memory-channel/src/main/java/org/apache/flume/channel/recoverable/memory/wal/WALReplayResult.java PRE-CREATION

          flume-ng-channels/flume-recoverable-memory-channel/src/test/java/org/apache/flume/channel/recoverable/memory/TestRecoverableMemoryChannel.java PRE-CREATION

          flume-ng-channels/flume-recoverable-memory-channel/src/test/java/org/apache/flume/channel/recoverable/memory/wal/TestSequenceIDBuffer.java PRE-CREATION

          flume-ng-channels/flume-recoverable-memory-channel/src/test/java/org/apache/flume/channel/recoverable/memory/wal/TestWAL.java PRE-CREATION

          flume-ng-channels/flume-recoverable-memory-channel/src/test/resources/log4j.properties PRE-CREATION

          flume-ng-channels/pom.xml 0bd8633

          flume-ng-core/src/main/java/org/apache/flume/channel/ChannelType.java d8419e8

          Diff: https://reviews.apache.org/r/4325/diff

          Testing

          -------

          Thanks,

          Brock

          Show
          jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4325/#review6586 ----------------------------------------------------------- Ship it! Changes look good Brock. Can you please update the dist module so that this gets packaged in the generated distro? Once done, please attach the patch to the Jira. Arvind On 2012-03-30 17:43:51, Brock Noland wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4325/ ----------------------------------------------------------- (Updated 2012-03-30 17:43:51) Review request for Flume. Summary ------- Attached patch not meant for commit. Just posting here for easy review. This addresses bug FLUME-896 . https://issues.apache.org/jira/browse/FLUME-896 Diffs ----- flume-ng-channels/flume-recoverable-memory-channel/pom.xml PRE-CREATION flume-ng-channels/flume-recoverable-memory-channel/src/main/java/org/apache/flume/channel/recoverable/memory/RecoverableMemoryChannel.java PRE-CREATION flume-ng-channels/flume-recoverable-memory-channel/src/main/java/org/apache/flume/channel/recoverable/memory/RecoverableMemoryChannelEvent.java PRE-CREATION flume-ng-channels/flume-recoverable-memory-channel/src/main/java/org/apache/flume/channel/recoverable/memory/wal/SequenceIDBuffer.java PRE-CREATION flume-ng-channels/flume-recoverable-memory-channel/src/main/java/org/apache/flume/channel/recoverable/memory/wal/WAL.java PRE-CREATION flume-ng-channels/flume-recoverable-memory-channel/src/main/java/org/apache/flume/channel/recoverable/memory/wal/WALDataFile.java PRE-CREATION flume-ng-channels/flume-recoverable-memory-channel/src/main/java/org/apache/flume/channel/recoverable/memory/wal/WALEntry.java PRE-CREATION flume-ng-channels/flume-recoverable-memory-channel/src/main/java/org/apache/flume/channel/recoverable/memory/wal/WALReplayResult.java PRE-CREATION flume-ng-channels/flume-recoverable-memory-channel/src/test/java/org/apache/flume/channel/recoverable/memory/TestRecoverableMemoryChannel.java PRE-CREATION flume-ng-channels/flume-recoverable-memory-channel/src/test/java/org/apache/flume/channel/recoverable/memory/wal/TestSequenceIDBuffer.java PRE-CREATION flume-ng-channels/flume-recoverable-memory-channel/src/test/java/org/apache/flume/channel/recoverable/memory/wal/TestWAL.java PRE-CREATION flume-ng-channels/flume-recoverable-memory-channel/src/test/resources/log4j.properties PRE-CREATION flume-ng-channels/pom.xml 0bd8633 flume-ng-core/src/main/java/org/apache/flume/channel/ChannelType.java d8419e8 Diff: https://reviews.apache.org/r/4325/diff Testing ------- Thanks, Brock
          Hide
          jiraposter@reviews.apache.org added a comment -

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/4325/
          -----------------------------------------------------------

          (Updated 2012-03-30 17:43:51.929216)

          Review request for Flume.

          Changes
          -------

          Renamed RecoverableMemoryChannel and made configurable via ENUM

          Summary
          -------

          Attached patch not meant for commit. Just posting here for easy review.

          This addresses bug FLUME-896.
          https://issues.apache.org/jira/browse/FLUME-896

          Diffs (updated)


          flume-ng-channels/flume-recoverable-memory-channel/pom.xml PRE-CREATION
          flume-ng-channels/flume-recoverable-memory-channel/src/main/java/org/apache/flume/channel/recoverable/memory/RecoverableMemoryChannel.java PRE-CREATION
          flume-ng-channels/flume-recoverable-memory-channel/src/main/java/org/apache/flume/channel/recoverable/memory/RecoverableMemoryChannelEvent.java PRE-CREATION
          flume-ng-channels/flume-recoverable-memory-channel/src/main/java/org/apache/flume/channel/recoverable/memory/wal/SequenceIDBuffer.java PRE-CREATION
          flume-ng-channels/flume-recoverable-memory-channel/src/main/java/org/apache/flume/channel/recoverable/memory/wal/WAL.java PRE-CREATION
          flume-ng-channels/flume-recoverable-memory-channel/src/main/java/org/apache/flume/channel/recoverable/memory/wal/WALDataFile.java PRE-CREATION
          flume-ng-channels/flume-recoverable-memory-channel/src/main/java/org/apache/flume/channel/recoverable/memory/wal/WALEntry.java PRE-CREATION
          flume-ng-channels/flume-recoverable-memory-channel/src/main/java/org/apache/flume/channel/recoverable/memory/wal/WALReplayResult.java PRE-CREATION
          flume-ng-channels/flume-recoverable-memory-channel/src/test/java/org/apache/flume/channel/recoverable/memory/TestRecoverableMemoryChannel.java PRE-CREATION
          flume-ng-channels/flume-recoverable-memory-channel/src/test/java/org/apache/flume/channel/recoverable/memory/wal/TestSequenceIDBuffer.java PRE-CREATION
          flume-ng-channels/flume-recoverable-memory-channel/src/test/java/org/apache/flume/channel/recoverable/memory/wal/TestWAL.java PRE-CREATION
          flume-ng-channels/flume-recoverable-memory-channel/src/test/resources/log4j.properties PRE-CREATION
          flume-ng-channels/pom.xml 0bd8633
          flume-ng-core/src/main/java/org/apache/flume/channel/ChannelType.java d8419e8

          Diff: https://reviews.apache.org/r/4325/diff

          Testing
          -------

          Thanks,

          Brock

          Show
          jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4325/ ----------------------------------------------------------- (Updated 2012-03-30 17:43:51.929216) Review request for Flume. Changes ------- Renamed RecoverableMemoryChannel and made configurable via ENUM Summary ------- Attached patch not meant for commit. Just posting here for easy review. This addresses bug FLUME-896 . https://issues.apache.org/jira/browse/FLUME-896 Diffs (updated) flume-ng-channels/flume-recoverable-memory-channel/pom.xml PRE-CREATION flume-ng-channels/flume-recoverable-memory-channel/src/main/java/org/apache/flume/channel/recoverable/memory/RecoverableMemoryChannel.java PRE-CREATION flume-ng-channels/flume-recoverable-memory-channel/src/main/java/org/apache/flume/channel/recoverable/memory/RecoverableMemoryChannelEvent.java PRE-CREATION flume-ng-channels/flume-recoverable-memory-channel/src/main/java/org/apache/flume/channel/recoverable/memory/wal/SequenceIDBuffer.java PRE-CREATION flume-ng-channels/flume-recoverable-memory-channel/src/main/java/org/apache/flume/channel/recoverable/memory/wal/WAL.java PRE-CREATION flume-ng-channels/flume-recoverable-memory-channel/src/main/java/org/apache/flume/channel/recoverable/memory/wal/WALDataFile.java PRE-CREATION flume-ng-channels/flume-recoverable-memory-channel/src/main/java/org/apache/flume/channel/recoverable/memory/wal/WALEntry.java PRE-CREATION flume-ng-channels/flume-recoverable-memory-channel/src/main/java/org/apache/flume/channel/recoverable/memory/wal/WALReplayResult.java PRE-CREATION flume-ng-channels/flume-recoverable-memory-channel/src/test/java/org/apache/flume/channel/recoverable/memory/TestRecoverableMemoryChannel.java PRE-CREATION flume-ng-channels/flume-recoverable-memory-channel/src/test/java/org/apache/flume/channel/recoverable/memory/wal/TestSequenceIDBuffer.java PRE-CREATION flume-ng-channels/flume-recoverable-memory-channel/src/test/java/org/apache/flume/channel/recoverable/memory/wal/TestWAL.java PRE-CREATION flume-ng-channels/flume-recoverable-memory-channel/src/test/resources/log4j.properties PRE-CREATION flume-ng-channels/pom.xml 0bd8633 flume-ng-core/src/main/java/org/apache/flume/channel/ChannelType.java d8419e8 Diff: https://reviews.apache.org/r/4325/diff Testing ------- Thanks, Brock
          Hide
          Sharad Agarwal added a comment -

          renaming the source/directory files and configuration keys from FileChannel/WAL to ReliableMemoryChannel/MemoryLog etc instead.

          +1. this implementation is indeed of memory channel backed by a file. Not a true FileChannel.

          Show
          Sharad Agarwal added a comment - renaming the source/directory files and configuration keys from FileChannel/WAL to ReliableMemoryChannel/MemoryLog etc instead. +1. this implementation is indeed of memory channel backed by a file. Not a true FileChannel.
          Hide
          jiraposter@reviews.apache.org added a comment -

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/4325/#review6183
          -----------------------------------------------------------

          Thanks for accommodating the changes Brock. Considering that this implementation primarily operates to provide reliability over the Memory Channel implementation, I suggest renaming the source/directory files and configuration keys from FileChannel/WAL to ReliableMemoryChannel/MemoryLog etc instead. If you agree, we should also rephrase the FLUME-896 Jira to explicitly state as implementing a recoverable/reliable memory channel. We can then open another Jira to track the development of a FileChannel.

          Along with that change, another request is to add it to the enum in ChannelType with an appropriate short name, and also some entries in the configuration template will be great.

          • Arvind

          On 2012-03-21 04:45:55, Brock Noland wrote:

          -----------------------------------------------------------

          This is an automatically generated e-mail. To reply, visit:

          https://reviews.apache.org/r/4325/

          -----------------------------------------------------------

          (Updated 2012-03-21 04:45:55)

          Review request for Flume.

          Summary

          -------

          Attached patch not meant for commit. Just posting here for easy review.

          This addresses bug FLUME-896.

          https://issues.apache.org/jira/browse/FLUME-896

          Diffs

          -----

          flume-ng-channels/flume-file-channel/pom.xml 48d1481

          flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/FileChannel.java a279453

          flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/FileChannelEvent.java PRE-CREATION

          flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/wal/SequenceIDBuffer.java PRE-CREATION

          flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/wal/WAL.java PRE-CREATION

          flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/wal/WALDataFile.java PRE-CREATION

          flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/wal/WALEntry.java PRE-CREATION

          flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/wal/WALReplayResult.java PRE-CREATION

          flume-ng-channels/flume-file-channel/src/test/java/org/apache/flume/channel/file/TestFileChannel.java ab66998

          flume-ng-channels/flume-file-channel/src/test/java/org/apache/flume/channel/file/wal/TestSequenceIDBuffer.java PRE-CREATION

          flume-ng-channels/flume-file-channel/src/test/java/org/apache/flume/channel/file/wal/TestWAL.java PRE-CREATION

          Diff: https://reviews.apache.org/r/4325/diff

          Testing

          -------

          Thanks,

          Brock

          Show
          jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4325/#review6183 ----------------------------------------------------------- Thanks for accommodating the changes Brock. Considering that this implementation primarily operates to provide reliability over the Memory Channel implementation, I suggest renaming the source/directory files and configuration keys from FileChannel/WAL to ReliableMemoryChannel/MemoryLog etc instead. If you agree, we should also rephrase the FLUME-896 Jira to explicitly state as implementing a recoverable/reliable memory channel. We can then open another Jira to track the development of a FileChannel. Along with that change, another request is to add it to the enum in ChannelType with an appropriate short name, and also some entries in the configuration template will be great. Arvind On 2012-03-21 04:45:55, Brock Noland wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4325/ ----------------------------------------------------------- (Updated 2012-03-21 04:45:55) Review request for Flume. Summary ------- Attached patch not meant for commit. Just posting here for easy review. This addresses bug FLUME-896 . https://issues.apache.org/jira/browse/FLUME-896 Diffs ----- flume-ng-channels/flume-file-channel/pom.xml 48d1481 flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/FileChannel.java a279453 flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/FileChannelEvent.java PRE-CREATION flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/wal/SequenceIDBuffer.java PRE-CREATION flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/wal/WAL.java PRE-CREATION flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/wal/WALDataFile.java PRE-CREATION flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/wal/WALEntry.java PRE-CREATION flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/wal/WALReplayResult.java PRE-CREATION flume-ng-channels/flume-file-channel/src/test/java/org/apache/flume/channel/file/TestFileChannel.java ab66998 flume-ng-channels/flume-file-channel/src/test/java/org/apache/flume/channel/file/wal/TestSequenceIDBuffer.java PRE-CREATION flume-ng-channels/flume-file-channel/src/test/java/org/apache/flume/channel/file/wal/TestWAL.java PRE-CREATION Diff: https://reviews.apache.org/r/4325/diff Testing ------- Thanks, Brock
          Hide
          jiraposter@reviews.apache.org added a comment -

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/4325/
          -----------------------------------------------------------

          (Updated 2012-03-21 04:45:55.885014)

          Review request for Flume.

          Changes
          -------

          Forgot to upload the absolute latest which simply adds a better log message versus the immediately preceding version.

          Summary
          -------

          Attached patch not meant for commit. Just posting here for easy review.

          This addresses bug FLUME-896.
          https://issues.apache.org/jira/browse/FLUME-896

          Diffs (updated)


          flume-ng-channels/flume-file-channel/pom.xml 48d1481
          flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/FileChannel.java a279453
          flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/FileChannelEvent.java PRE-CREATION
          flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/wal/SequenceIDBuffer.java PRE-CREATION
          flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/wal/WAL.java PRE-CREATION
          flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/wal/WALDataFile.java PRE-CREATION
          flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/wal/WALEntry.java PRE-CREATION
          flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/wal/WALReplayResult.java PRE-CREATION
          flume-ng-channels/flume-file-channel/src/test/java/org/apache/flume/channel/file/TestFileChannel.java ab66998
          flume-ng-channels/flume-file-channel/src/test/java/org/apache/flume/channel/file/wal/TestSequenceIDBuffer.java PRE-CREATION
          flume-ng-channels/flume-file-channel/src/test/java/org/apache/flume/channel/file/wal/TestWAL.java PRE-CREATION

          Diff: https://reviews.apache.org/r/4325/diff

          Testing
          -------

          Thanks,

          Brock

          Show
          jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4325/ ----------------------------------------------------------- (Updated 2012-03-21 04:45:55.885014) Review request for Flume. Changes ------- Forgot to upload the absolute latest which simply adds a better log message versus the immediately preceding version. Summary ------- Attached patch not meant for commit. Just posting here for easy review. This addresses bug FLUME-896 . https://issues.apache.org/jira/browse/FLUME-896 Diffs (updated) flume-ng-channels/flume-file-channel/pom.xml 48d1481 flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/FileChannel.java a279453 flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/FileChannelEvent.java PRE-CREATION flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/wal/SequenceIDBuffer.java PRE-CREATION flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/wal/WAL.java PRE-CREATION flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/wal/WALDataFile.java PRE-CREATION flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/wal/WALEntry.java PRE-CREATION flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/wal/WALReplayResult.java PRE-CREATION flume-ng-channels/flume-file-channel/src/test/java/org/apache/flume/channel/file/TestFileChannel.java ab66998 flume-ng-channels/flume-file-channel/src/test/java/org/apache/flume/channel/file/wal/TestSequenceIDBuffer.java PRE-CREATION flume-ng-channels/flume-file-channel/src/test/java/org/apache/flume/channel/file/wal/TestWAL.java PRE-CREATION Diff: https://reviews.apache.org/r/4325/diff Testing ------- Thanks, Brock
          Hide
          jiraposter@reviews.apache.org added a comment -

          On 2012-03-21 02:29:18, Arvind Prabhakar wrote:

          > Thanks for the patch Brock. The changes look good. Some feedback follows:

          >

          > General:

          > * There are a few places where whitespace has creeped in - highlighted as red in the review. Please remove it in the final patch.

          > * Also, there are a few places where the line length exceeds 80 chars, will be nice to take care of that as well in the final patch.

          >

          > Design/Impl:

          > * The FileBackedTransaction.doCommit() only calls the fileChannel.commitSequenceID(lastSequenceId) if the lastSequenceId is greater than 0. This will not be the case if the channel has no sinks for it. As a result everything will be replayed every time the channel starts up, causing FileChannel.configure() to fail with OOM after a certain threshold is breached. Ideally, the sequeceId commit should happen on every commit call.

          > * On the same lines, the implementation should override the stop() method to perform cleanup and close any of the WAL resources that may be actively in use.

          > * The take implementation seems to be limited to the contents of MemoryChannel only. This will not have any of the events that were persisted in the previous run of the FileChannel and therefore those events will get locked out of any sink's reach.

          > * Finally, it is not clear to me from the implementation how the concurrent semantic will work. Can you please explain (or better still javadoc) the expected workings of the WAL implementation?

          >

          The FileBackedTransaction.doCommit() only calls the fileChannel.commitSequenceID(lastSequenceId) if the lastSequenceId is greater than 0. This will not be the case if the channel has no sinks for it. As a result everything will be replayed every time the channel starts up, causing FileChannel.configure() to fail with OOM after a certain threshold is breached. Ideally, the sequeceId commit should happen on every commit call.

          We can only commit sequence ids which have been read off the queue. All SequenceIDs will be greater than zero so any event read off the queue and committed will have its sequence id written to the WAL. Note we can still OOM when the number of sequence ids not events which have been written to disk exceed 8.3 million (64MB (default direct memory size) / 8 bytes). We can predicate when this may occur and print a message saying how much -XX:MaxDirectMemorySize needs to be increased when it does occur.

          At this point, I think the only alternative is to implement complex merge sort logic for the sequence id portion of the WAL.

          On the same lines, the implementation should override the stop() method to perform cleanup and close any of the WAL resources that may be actively in use.

          done

          The take implementation seems to be limited to the contents of MemoryChannel only. This will not have any of the events that were persisted in the previous run of the FileChannel and therefore those events will get locked out of any sink's reach.

          When we replay the log we re-populate MemoryChannel so events there were committed to MemoryChannel will be persisted to disk.

          Finally, it is not clear to me from the implementation how the concurrent semantic will work. Can you please explain (or better still javadoc) the expected workings of the WAL implementation?

          I updated the patch based on some testing on a cluster I did. I found that sequence id writes were waiting on event writes often. As such, now the only synchronization WAL does is when a roll is occurring. Otherwise the synchronization has been pushed down to the object actually writing to the file WALDataFile.Writer.

          On 2012-03-21 02:29:18, Arvind Prabhakar wrote:

          > flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/FileChannel.java, line 90

          > <https://reviews.apache.org/r/4325/diff/4/?file=92219#file92219line90>

          >

          > It will be great if this memory channel can also be configured correctly via the system configuration. That way, the end user will be able to specify the cap on the number of events that can live in all the active transactions combined.

          >

          > This will help ensure that this channel does not run into OOM issues.

          done

          On 2012-03-21 02:29:18, Arvind Prabhakar wrote:

          > flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/FileChannel.java, line 108

          > <https://reviews.apache.org/r/4325/diff/4/?file=92219#file92219line108>

          >

          > This constructor does not allow the specification of roll size, roll required interval, max log size, min log retention period and worker interval. All of these properties should be extracted from the context and passed into the WAL constructor.

          done

          On 2012-03-21 02:29:18, Arvind Prabhakar wrote:

          > flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/FileChannel.java, lines 109-111

          > <https://reviews.apache.org/r/4325/diff/4/?file=92219#file92219line109>

          >

          > This logic is probably better done in the start() method than in the configure method.

          done

          On 2012-03-21 02:29:18, Arvind Prabhakar wrote:

          > flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/wal/SequenceIDBuffer.java, line 1

          > <https://reviews.apache.org/r/4325/diff/4/?file=92221#file92221line1>

          >

          > Missing license header.

          done

          On 2012-03-21 02:29:18, Arvind Prabhakar wrote:

          > flume-ng-channels/flume-file-channel/src/test/java/org/apache/flume/channel/file/wal/TestSequenceIDBuffer.java, line 1

          > <https://reviews.apache.org/r/4325/diff/4/?file=92227#file92227line1>

          >

          > Missing license header.

          done

          On 2012-03-21 02:29:18, Arvind Prabhakar wrote:

          > flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/FileChannel.java, line 84

          > <https://reviews.apache.org/r/4325/diff/4/?file=92219#file92219line84>

          >

          > The JDBC channel defaults to ${user.home}/.flume/jdbc-channel. I suggest defaulting the file channel working directory to something similar like ${user.home}/.flume/file-channel

          done

          On 2012-03-21 02:29:18, Arvind Prabhakar wrote:

          > flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/FileChannel.java, lines 61-65

          > <https://reviews.apache.org/r/4325/diff/4/?file=92219#file92219line61>

          >

          > Please override the initialize() method to do any initialization for the system.

          This is done in the configure method since this should be overridden.

          On 2012-03-21 02:29:18, Arvind Prabhakar wrote:

          > flume-ng-channels/flume-file-channel/pom.xml, lines 76-78

          > <https://reviews.apache.org/r/4325/diff/4/?file=92218#file92218line76>

          >

          > This change seems to be unrelated to the file channel. If that is the case, we should factor it out.

          We need hadoop as a dep so we can use writables.

          • Brock

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/4325/#review6126
          -----------------------------------------------------------

          On 2012-03-21 04:45:55, Brock Noland wrote:

          -----------------------------------------------------------

          This is an automatically generated e-mail. To reply, visit:

          https://reviews.apache.org/r/4325/

          -----------------------------------------------------------

          (Updated 2012-03-21 04:45:55)

          Review request for Flume.

          Summary

          -------

          Attached patch not meant for commit. Just posting here for easy review.

          This addresses bug FLUME-896.

          https://issues.apache.org/jira/browse/FLUME-896

          Diffs

          -----

          flume-ng-channels/flume-file-channel/pom.xml 48d1481

          flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/FileChannel.java a279453

          flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/FileChannelEvent.java PRE-CREATION

          flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/wal/SequenceIDBuffer.java PRE-CREATION

          flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/wal/WAL.java PRE-CREATION

          flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/wal/WALDataFile.java PRE-CREATION

          flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/wal/WALEntry.java PRE-CREATION

          flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/wal/WALReplayResult.java PRE-CREATION

          flume-ng-channels/flume-file-channel/src/test/java/org/apache/flume/channel/file/TestFileChannel.java ab66998

          flume-ng-channels/flume-file-channel/src/test/java/org/apache/flume/channel/file/wal/TestSequenceIDBuffer.java PRE-CREATION

          flume-ng-channels/flume-file-channel/src/test/java/org/apache/flume/channel/file/wal/TestWAL.java PRE-CREATION

          Diff: https://reviews.apache.org/r/4325/diff

          Testing

          -------

          Thanks,

          Brock

          Show
          jiraposter@reviews.apache.org added a comment - On 2012-03-21 02:29:18, Arvind Prabhakar wrote: > Thanks for the patch Brock. The changes look good. Some feedback follows: > > General: > * There are a few places where whitespace has creeped in - highlighted as red in the review. Please remove it in the final patch. > * Also, there are a few places where the line length exceeds 80 chars, will be nice to take care of that as well in the final patch. > > Design/Impl: > * The FileBackedTransaction.doCommit() only calls the fileChannel.commitSequenceID(lastSequenceId) if the lastSequenceId is greater than 0. This will not be the case if the channel has no sinks for it. As a result everything will be replayed every time the channel starts up, causing FileChannel.configure() to fail with OOM after a certain threshold is breached. Ideally, the sequeceId commit should happen on every commit call. > * On the same lines, the implementation should override the stop() method to perform cleanup and close any of the WAL resources that may be actively in use. > * The take implementation seems to be limited to the contents of MemoryChannel only. This will not have any of the events that were persisted in the previous run of the FileChannel and therefore those events will get locked out of any sink's reach. > * Finally, it is not clear to me from the implementation how the concurrent semantic will work. Can you please explain (or better still javadoc) the expected workings of the WAL implementation? > The FileBackedTransaction.doCommit() only calls the fileChannel.commitSequenceID(lastSequenceId) if the lastSequenceId is greater than 0. This will not be the case if the channel has no sinks for it. As a result everything will be replayed every time the channel starts up, causing FileChannel.configure() to fail with OOM after a certain threshold is breached. Ideally, the sequeceId commit should happen on every commit call. We can only commit sequence ids which have been read off the queue. All SequenceIDs will be greater than zero so any event read off the queue and committed will have its sequence id written to the WAL. Note we can still OOM when the number of sequence ids not events which have been written to disk exceed 8.3 million (64MB (default direct memory size) / 8 bytes). We can predicate when this may occur and print a message saying how much -XX:MaxDirectMemorySize needs to be increased when it does occur. At this point, I think the only alternative is to implement complex merge sort logic for the sequence id portion of the WAL. On the same lines, the implementation should override the stop() method to perform cleanup and close any of the WAL resources that may be actively in use. done The take implementation seems to be limited to the contents of MemoryChannel only. This will not have any of the events that were persisted in the previous run of the FileChannel and therefore those events will get locked out of any sink's reach. When we replay the log we re-populate MemoryChannel so events there were committed to MemoryChannel will be persisted to disk. Finally, it is not clear to me from the implementation how the concurrent semantic will work. Can you please explain (or better still javadoc) the expected workings of the WAL implementation? I updated the patch based on some testing on a cluster I did. I found that sequence id writes were waiting on event writes often. As such, now the only synchronization WAL does is when a roll is occurring. Otherwise the synchronization has been pushed down to the object actually writing to the file WALDataFile.Writer. On 2012-03-21 02:29:18, Arvind Prabhakar wrote: > flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/FileChannel.java, line 90 > < https://reviews.apache.org/r/4325/diff/4/?file=92219#file92219line90 > > > It will be great if this memory channel can also be configured correctly via the system configuration. That way, the end user will be able to specify the cap on the number of events that can live in all the active transactions combined. > > This will help ensure that this channel does not run into OOM issues. done On 2012-03-21 02:29:18, Arvind Prabhakar wrote: > flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/FileChannel.java, line 108 > < https://reviews.apache.org/r/4325/diff/4/?file=92219#file92219line108 > > > This constructor does not allow the specification of roll size, roll required interval, max log size, min log retention period and worker interval. All of these properties should be extracted from the context and passed into the WAL constructor. done On 2012-03-21 02:29:18, Arvind Prabhakar wrote: > flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/FileChannel.java, lines 109-111 > < https://reviews.apache.org/r/4325/diff/4/?file=92219#file92219line109 > > > This logic is probably better done in the start() method than in the configure method. done On 2012-03-21 02:29:18, Arvind Prabhakar wrote: > flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/wal/SequenceIDBuffer.java, line 1 > < https://reviews.apache.org/r/4325/diff/4/?file=92221#file92221line1 > > > Missing license header. done On 2012-03-21 02:29:18, Arvind Prabhakar wrote: > flume-ng-channels/flume-file-channel/src/test/java/org/apache/flume/channel/file/wal/TestSequenceIDBuffer.java, line 1 > < https://reviews.apache.org/r/4325/diff/4/?file=92227#file92227line1 > > > Missing license header. done On 2012-03-21 02:29:18, Arvind Prabhakar wrote: > flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/FileChannel.java, line 84 > < https://reviews.apache.org/r/4325/diff/4/?file=92219#file92219line84 > > > The JDBC channel defaults to ${user.home}/.flume/jdbc-channel. I suggest defaulting the file channel working directory to something similar like ${user.home}/.flume/file-channel done On 2012-03-21 02:29:18, Arvind Prabhakar wrote: > flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/FileChannel.java, lines 61-65 > < https://reviews.apache.org/r/4325/diff/4/?file=92219#file92219line61 > > > Please override the initialize() method to do any initialization for the system. This is done in the configure method since this should be overridden. On 2012-03-21 02:29:18, Arvind Prabhakar wrote: > flume-ng-channels/flume-file-channel/pom.xml, lines 76-78 > < https://reviews.apache.org/r/4325/diff/4/?file=92218#file92218line76 > > > This change seems to be unrelated to the file channel. If that is the case, we should factor it out. We need hadoop as a dep so we can use writables. Brock ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4325/#review6126 ----------------------------------------------------------- On 2012-03-21 04:45:55, Brock Noland wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4325/ ----------------------------------------------------------- (Updated 2012-03-21 04:45:55) Review request for Flume. Summary ------- Attached patch not meant for commit. Just posting here for easy review. This addresses bug FLUME-896 . https://issues.apache.org/jira/browse/FLUME-896 Diffs ----- flume-ng-channels/flume-file-channel/pom.xml 48d1481 flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/FileChannel.java a279453 flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/FileChannelEvent.java PRE-CREATION flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/wal/SequenceIDBuffer.java PRE-CREATION flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/wal/WAL.java PRE-CREATION flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/wal/WALDataFile.java PRE-CREATION flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/wal/WALEntry.java PRE-CREATION flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/wal/WALReplayResult.java PRE-CREATION flume-ng-channels/flume-file-channel/src/test/java/org/apache/flume/channel/file/TestFileChannel.java ab66998 flume-ng-channels/flume-file-channel/src/test/java/org/apache/flume/channel/file/wal/TestSequenceIDBuffer.java PRE-CREATION flume-ng-channels/flume-file-channel/src/test/java/org/apache/flume/channel/file/wal/TestWAL.java PRE-CREATION Diff: https://reviews.apache.org/r/4325/diff Testing ------- Thanks, Brock
          Hide
          Brock Noland added a comment -

          Patch on RB.

          Show
          Brock Noland added a comment - Patch on RB.
          Hide
          jiraposter@reviews.apache.org added a comment -

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/4325/
          -----------------------------------------------------------

          (Updated 2012-03-21 04:42:58.155747)

          Review request for Flume.

          Changes
          -------

          1) Move WAL synchronization down to WalDataFile.Writer as I saw threads writing events blocking on threads writing sequenceids which are seperate files.
          2) Configuration now pulled from Context.
          3) Log a warning when we expect to be unable to replay the logs and error when we cannot replay them do to memory.

          Summary
          -------

          Attached patch not meant for commit. Just posting here for easy review.

          This addresses bug FLUME-896.
          https://issues.apache.org/jira/browse/FLUME-896

          Diffs (updated)


          flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/FileChannel.java a279453
          flume-ng-channels/flume-file-channel/pom.xml 48d1481
          flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/FileChannelEvent.java PRE-CREATION
          flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/wal/SequenceIDBuffer.java PRE-CREATION
          flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/wal/WAL.java PRE-CREATION
          flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/wal/WALDataFile.java PRE-CREATION
          flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/wal/WALEntry.java PRE-CREATION
          flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/wal/WALReplayResult.java PRE-CREATION
          flume-ng-channels/flume-file-channel/src/test/java/org/apache/flume/channel/file/TestFileChannel.java ab66998
          flume-ng-channels/flume-file-channel/src/test/java/org/apache/flume/channel/file/wal/TestSequenceIDBuffer.java PRE-CREATION
          flume-ng-channels/flume-file-channel/src/test/java/org/apache/flume/channel/file/wal/TestWAL.java PRE-CREATION

          Diff: https://reviews.apache.org/r/4325/diff

          Testing
          -------

          Thanks,

          Brock

          Show
          jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4325/ ----------------------------------------------------------- (Updated 2012-03-21 04:42:58.155747) Review request for Flume. Changes ------- 1) Move WAL synchronization down to WalDataFile.Writer as I saw threads writing events blocking on threads writing sequenceids which are seperate files. 2) Configuration now pulled from Context. 3) Log a warning when we expect to be unable to replay the logs and error when we cannot replay them do to memory. Summary ------- Attached patch not meant for commit. Just posting here for easy review. This addresses bug FLUME-896 . https://issues.apache.org/jira/browse/FLUME-896 Diffs (updated) flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/FileChannel.java a279453 flume-ng-channels/flume-file-channel/pom.xml 48d1481 flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/FileChannelEvent.java PRE-CREATION flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/wal/SequenceIDBuffer.java PRE-CREATION flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/wal/WAL.java PRE-CREATION flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/wal/WALDataFile.java PRE-CREATION flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/wal/WALEntry.java PRE-CREATION flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/wal/WALReplayResult.java PRE-CREATION flume-ng-channels/flume-file-channel/src/test/java/org/apache/flume/channel/file/TestFileChannel.java ab66998 flume-ng-channels/flume-file-channel/src/test/java/org/apache/flume/channel/file/wal/TestSequenceIDBuffer.java PRE-CREATION flume-ng-channels/flume-file-channel/src/test/java/org/apache/flume/channel/file/wal/TestWAL.java PRE-CREATION Diff: https://reviews.apache.org/r/4325/diff Testing ------- Thanks, Brock
          Hide
          jiraposter@reviews.apache.org added a comment -

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/4325/#review6126
          -----------------------------------------------------------

          Thanks for the patch Brock. The changes look good. Some feedback follows:

          General:

          • There are a few places where whitespace has creeped in - highlighted as red in the review. Please remove it in the final patch.
          • Also, there are a few places where the line length exceeds 80 chars, will be nice to take care of that as well in the final patch.

          Design/Impl:

          • The FileBackedTransaction.doCommit() only calls the fileChannel.commitSequenceID(lastSequenceId) if the lastSequenceId is greater than 0. This will not be the case if the channel has no sinks for it. As a result everything will be replayed every time the channel starts up, causing FileChannel.configure() to fail with OOM after a certain threshold is breached. Ideally, the sequeceId commit should happen on every commit call.
          • On the same lines, the implementation should override the stop() method to perform cleanup and close any of the WAL resources that may be actively in use.
          • The take implementation seems to be limited to the contents of MemoryChannel only. This will not have any of the events that were persisted in the previous run of the FileChannel and therefore those events will get locked out of any sink's reach.
          • Finally, it is not clear to me from the implementation how the concurrent semantic will work. Can you please explain (or better still javadoc) the expected workings of the WAL implementation?

          flume-ng-channels/flume-file-channel/pom.xml
          <https://reviews.apache.org/r/4325/#comment13164>

          This change seems to be unrelated to the file channel. If that is the case, we should factor it out.

          flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/FileChannel.java
          <https://reviews.apache.org/r/4325/#comment13181>

          Please override the initialize() method to do any initialization for the system.

          flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/FileChannel.java
          <https://reviews.apache.org/r/4325/#comment13180>

          The JDBC channel defaults to $

          {user.home}/.flume/jdbc-channel. I suggest defaulting the file channel working directory to something similar like ${user.home}

          /.flume/file-channel

          flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/FileChannel.java
          <https://reviews.apache.org/r/4325/#comment13214>

          It will be great if this memory channel can also be configured correctly via the system configuration. That way, the end user will be able to specify the cap on the number of events that can live in all the active transactions combined.

          This will help ensure that this channel does not run into OOM issues.

          flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/FileChannel.java
          <https://reviews.apache.org/r/4325/#comment13182>

          This constructor does not allow the specification of roll size, roll required interval, max log size, min log retention period and worker interval. All of these properties should be extracted from the context and passed into the WAL constructor.

          flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/FileChannel.java
          <https://reviews.apache.org/r/4325/#comment13222>

          This logic is probably better done in the start() method than in the configure method.

          flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/wal/SequenceIDBuffer.java
          <https://reviews.apache.org/r/4325/#comment13162>

          Missing license header.

          flume-ng-channels/flume-file-channel/src/test/java/org/apache/flume/channel/file/wal/TestSequenceIDBuffer.java
          <https://reviews.apache.org/r/4325/#comment13163>

          Missing license header.

          • Arvind

          On 2012-03-14 19:04:52, Brock Noland wrote:

          -----------------------------------------------------------

          This is an automatically generated e-mail. To reply, visit:

          https://reviews.apache.org/r/4325/

          -----------------------------------------------------------

          (Updated 2012-03-14 19:04:52)

          Review request for Flume.

          Summary

          -------

          Attached patch not meant for commit. Just posting here for easy review.

          This addresses bug FLUME-896.

          https://issues.apache.org/jira/browse/FLUME-896

          Diffs

          -----

          flume-ng-channels/flume-file-channel/pom.xml ee2d20f

          flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/FileChannel.java a279453

          flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/FileChannelEvent.java PRE-CREATION

          flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/wal/SequenceIDBuffer.java PRE-CREATION

          flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/wal/WAL.java PRE-CREATION

          flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/wal/WALDataFile.java PRE-CREATION

          flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/wal/WALEntry.java PRE-CREATION

          flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/wal/WALReplayResult.java PRE-CREATION

          flume-ng-channels/flume-file-channel/src/test/java/org/apache/flume/channel/file/TestFileChannel.java ab66998

          flume-ng-channels/flume-file-channel/src/test/java/org/apache/flume/channel/file/wal/TestSequenceIDBuffer.java PRE-CREATION

          flume-ng-channels/flume-file-channel/src/test/java/org/apache/flume/channel/file/wal/TestWAL.java PRE-CREATION

          Diff: https://reviews.apache.org/r/4325/diff

          Testing

          -------

          Thanks,

          Brock

          Show
          jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4325/#review6126 ----------------------------------------------------------- Thanks for the patch Brock. The changes look good. Some feedback follows: General: There are a few places where whitespace has creeped in - highlighted as red in the review. Please remove it in the final patch. Also, there are a few places where the line length exceeds 80 chars, will be nice to take care of that as well in the final patch. Design/Impl: The FileBackedTransaction.doCommit() only calls the fileChannel.commitSequenceID(lastSequenceId) if the lastSequenceId is greater than 0. This will not be the case if the channel has no sinks for it. As a result everything will be replayed every time the channel starts up, causing FileChannel.configure() to fail with OOM after a certain threshold is breached. Ideally, the sequeceId commit should happen on every commit call. On the same lines, the implementation should override the stop() method to perform cleanup and close any of the WAL resources that may be actively in use. The take implementation seems to be limited to the contents of MemoryChannel only. This will not have any of the events that were persisted in the previous run of the FileChannel and therefore those events will get locked out of any sink's reach. Finally, it is not clear to me from the implementation how the concurrent semantic will work. Can you please explain (or better still javadoc) the expected workings of the WAL implementation? flume-ng-channels/flume-file-channel/pom.xml < https://reviews.apache.org/r/4325/#comment13164 > This change seems to be unrelated to the file channel. If that is the case, we should factor it out. flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/FileChannel.java < https://reviews.apache.org/r/4325/#comment13181 > Please override the initialize() method to do any initialization for the system. flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/FileChannel.java < https://reviews.apache.org/r/4325/#comment13180 > The JDBC channel defaults to $ {user.home}/.flume/jdbc-channel. I suggest defaulting the file channel working directory to something similar like ${user.home} /.flume/file-channel flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/FileChannel.java < https://reviews.apache.org/r/4325/#comment13214 > It will be great if this memory channel can also be configured correctly via the system configuration. That way, the end user will be able to specify the cap on the number of events that can live in all the active transactions combined. This will help ensure that this channel does not run into OOM issues. flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/FileChannel.java < https://reviews.apache.org/r/4325/#comment13182 > This constructor does not allow the specification of roll size, roll required interval, max log size, min log retention period and worker interval. All of these properties should be extracted from the context and passed into the WAL constructor. flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/FileChannel.java < https://reviews.apache.org/r/4325/#comment13222 > This logic is probably better done in the start() method than in the configure method. flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/wal/SequenceIDBuffer.java < https://reviews.apache.org/r/4325/#comment13162 > Missing license header. flume-ng-channels/flume-file-channel/src/test/java/org/apache/flume/channel/file/wal/TestSequenceIDBuffer.java < https://reviews.apache.org/r/4325/#comment13163 > Missing license header. Arvind On 2012-03-14 19:04:52, Brock Noland wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4325/ ----------------------------------------------------------- (Updated 2012-03-14 19:04:52) Review request for Flume. Summary ------- Attached patch not meant for commit. Just posting here for easy review. This addresses bug FLUME-896 . https://issues.apache.org/jira/browse/FLUME-896 Diffs ----- flume-ng-channels/flume-file-channel/pom.xml ee2d20f flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/FileChannel.java a279453 flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/FileChannelEvent.java PRE-CREATION flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/wal/SequenceIDBuffer.java PRE-CREATION flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/wal/WAL.java PRE-CREATION flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/wal/WALDataFile.java PRE-CREATION flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/wal/WALEntry.java PRE-CREATION flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/wal/WALReplayResult.java PRE-CREATION flume-ng-channels/flume-file-channel/src/test/java/org/apache/flume/channel/file/TestFileChannel.java ab66998 flume-ng-channels/flume-file-channel/src/test/java/org/apache/flume/channel/file/wal/TestSequenceIDBuffer.java PRE-CREATION flume-ng-channels/flume-file-channel/src/test/java/org/apache/flume/channel/file/wal/TestWAL.java PRE-CREATION Diff: https://reviews.apache.org/r/4325/diff Testing ------- Thanks, Brock
          Hide
          jiraposter@reviews.apache.org added a comment -

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/4325/
          -----------------------------------------------------------

          (Updated 2012-03-14 19:04:52.742262)

          Review request for Flume.

          Changes
          -------

          Once again this not meant for commit, I personally wanted to give this a try myself. Might be useful, might not. Regardless, I just wanted to upload the latest patch before I move on to other work.

          The latest changes keep the large memory consumption during replay in an off heap buffer. If the amount of SequenceIDs to be replayed is greater than 64MB -XX:MaxDirectMemorySize would need to be increased. The advantage of this approach is that if a large amount of ram is needed for replaying it can be explicitly allocated and deallocated during that period of time as opposed to being in the heap always.

          Summary
          -------

          Attached patch not meant for commit. Just posting here for easy review.

          This addresses bug FLUME-896.
          https://issues.apache.org/jira/browse/FLUME-896

          Diffs (updated)


          flume-ng-channels/flume-file-channel/pom.xml ee2d20f
          flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/FileChannel.java a279453
          flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/FileChannelEvent.java PRE-CREATION
          flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/wal/SequenceIDBuffer.java PRE-CREATION
          flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/wal/WAL.java PRE-CREATION
          flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/wal/WALDataFile.java PRE-CREATION
          flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/wal/WALEntry.java PRE-CREATION
          flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/wal/WALReplayResult.java PRE-CREATION
          flume-ng-channels/flume-file-channel/src/test/java/org/apache/flume/channel/file/TestFileChannel.java ab66998
          flume-ng-channels/flume-file-channel/src/test/java/org/apache/flume/channel/file/wal/TestSequenceIDBuffer.java PRE-CREATION
          flume-ng-channels/flume-file-channel/src/test/java/org/apache/flume/channel/file/wal/TestWAL.java PRE-CREATION

          Diff: https://reviews.apache.org/r/4325/diff

          Testing
          -------

          Thanks,

          Brock

          Show
          jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4325/ ----------------------------------------------------------- (Updated 2012-03-14 19:04:52.742262) Review request for Flume. Changes ------- Once again this not meant for commit, I personally wanted to give this a try myself. Might be useful, might not. Regardless, I just wanted to upload the latest patch before I move on to other work. The latest changes keep the large memory consumption during replay in an off heap buffer. If the amount of SequenceIDs to be replayed is greater than 64MB -XX:MaxDirectMemorySize would need to be increased. The advantage of this approach is that if a large amount of ram is needed for replaying it can be explicitly allocated and deallocated during that period of time as opposed to being in the heap always. Summary ------- Attached patch not meant for commit. Just posting here for easy review. This addresses bug FLUME-896 . https://issues.apache.org/jira/browse/FLUME-896 Diffs (updated) flume-ng-channels/flume-file-channel/pom.xml ee2d20f flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/FileChannel.java a279453 flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/FileChannelEvent.java PRE-CREATION flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/wal/SequenceIDBuffer.java PRE-CREATION flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/wal/WAL.java PRE-CREATION flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/wal/WALDataFile.java PRE-CREATION flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/wal/WALEntry.java PRE-CREATION flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/wal/WALReplayResult.java PRE-CREATION flume-ng-channels/flume-file-channel/src/test/java/org/apache/flume/channel/file/TestFileChannel.java ab66998 flume-ng-channels/flume-file-channel/src/test/java/org/apache/flume/channel/file/wal/TestSequenceIDBuffer.java PRE-CREATION flume-ng-channels/flume-file-channel/src/test/java/org/apache/flume/channel/file/wal/TestWAL.java PRE-CREATION Diff: https://reviews.apache.org/r/4325/diff Testing ------- Thanks, Brock
          Hide
          Brock Noland added a comment -

          Once again this not meant for commit, I personally wanted to give this a try myself. Might be useful, might not. Regardless, I just wanted to upload the latest patch before I move on to other work.

          Show
          Brock Noland added a comment - Once again this not meant for commit, I personally wanted to give this a try myself. Might be useful, might not. Regardless, I just wanted to upload the latest patch before I move on to other work.
          Hide
          jiraposter@reviews.apache.org added a comment -

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/4325/
          -----------------------------------------------------------

          (Updated 2012-03-14 14:20:26.585694)

          Review request for Flume.

          Changes
          -------

          Fixed a variable rename which was missed.

          Summary
          -------

          Attached patch not meant for commit. Just posting here for easy review.

          This addresses bug FLUME-896.
          https://issues.apache.org/jira/browse/FLUME-896

          Diffs (updated)


          flume-ng-channels/flume-file-channel/pom.xml ee2d20f
          flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/FileChannel.java a279453
          flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/FileChannelEvent.java PRE-CREATION
          flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/wal/WAL.java PRE-CREATION
          flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/wal/WALDataFile.java PRE-CREATION
          flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/wal/WALEntry.java PRE-CREATION
          flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/wal/WALReplyResult.java PRE-CREATION
          flume-ng-channels/flume-file-channel/src/test/java/org/apache/flume/channel/file/TestFileChannel.java ab66998
          flume-ng-channels/flume-file-channel/src/test/java/org/apache/flume/channel/file/wal/TestWAL.java PRE-CREATION

          Diff: https://reviews.apache.org/r/4325/diff

          Testing
          -------

          Thanks,

          Brock

          Show
          jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4325/ ----------------------------------------------------------- (Updated 2012-03-14 14:20:26.585694) Review request for Flume. Changes ------- Fixed a variable rename which was missed. Summary ------- Attached patch not meant for commit. Just posting here for easy review. This addresses bug FLUME-896 . https://issues.apache.org/jira/browse/FLUME-896 Diffs (updated) flume-ng-channels/flume-file-channel/pom.xml ee2d20f flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/FileChannel.java a279453 flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/FileChannelEvent.java PRE-CREATION flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/wal/WAL.java PRE-CREATION flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/wal/WALDataFile.java PRE-CREATION flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/wal/WALEntry.java PRE-CREATION flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/wal/WALReplyResult.java PRE-CREATION flume-ng-channels/flume-file-channel/src/test/java/org/apache/flume/channel/file/TestFileChannel.java ab66998 flume-ng-channels/flume-file-channel/src/test/java/org/apache/flume/channel/file/wal/TestWAL.java PRE-CREATION Diff: https://reviews.apache.org/r/4325/diff Testing ------- Thanks, Brock
          Hide
          jiraposter@reviews.apache.org added a comment -

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/4325/
          -----------------------------------------------------------

          (Updated 2012-03-14 14:11:18.627099)

          Review request for Flume.

          Changes
          -------

          Once again, for anyone watching. I am not proposing this for immediate commit. It's more of a thought process/discussion.

          The attached is less sexy, but would work in your scenario. Basically we read all the sequence ids into a large array (for memory savings), sort it, and then only replay entries not in the array. The memory consumption here is not terrible. Let's say we had 5GB of events on disk at 100 bytes an item. That translates into 53687091 items at 8 bytes an long that's 400MB of ram replay the 5GB of events.

          Summary
          -------

          Attached patch not meant for commit. Just posting here for easy review.

          This addresses bug FLUME-896.
          https://issues.apache.org/jira/browse/FLUME-896

          Diffs (updated)


          flume-ng-channels/flume-file-channel/pom.xml ee2d20f
          flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/FileChannel.java a279453
          flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/FileChannelEvent.java PRE-CREATION
          flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/wal/WAL.java PRE-CREATION
          flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/wal/WALDataFile.java PRE-CREATION
          flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/wal/WALEntry.java PRE-CREATION
          flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/wal/WALReplyResult.java PRE-CREATION
          flume-ng-channels/flume-file-channel/src/test/java/org/apache/flume/channel/file/TestFileChannel.java ab66998
          flume-ng-channels/flume-file-channel/src/test/java/org/apache/flume/channel/file/wal/TestWAL.java PRE-CREATION

          Diff: https://reviews.apache.org/r/4325/diff

          Testing
          -------

          Thanks,

          Brock

          Show
          jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4325/ ----------------------------------------------------------- (Updated 2012-03-14 14:11:18.627099) Review request for Flume. Changes ------- Once again, for anyone watching. I am not proposing this for immediate commit. It's more of a thought process/discussion. The attached is less sexy, but would work in your scenario. Basically we read all the sequence ids into a large array (for memory savings), sort it, and then only replay entries not in the array. The memory consumption here is not terrible. Let's say we had 5GB of events on disk at 100 bytes an item. That translates into 53687091 items at 8 bytes an long that's 400MB of ram replay the 5GB of events. Summary ------- Attached patch not meant for commit. Just posting here for easy review. This addresses bug FLUME-896 . https://issues.apache.org/jira/browse/FLUME-896 Diffs (updated) flume-ng-channels/flume-file-channel/pom.xml ee2d20f flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/FileChannel.java a279453 flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/FileChannelEvent.java PRE-CREATION flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/wal/WAL.java PRE-CREATION flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/wal/WALDataFile.java PRE-CREATION flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/wal/WALEntry.java PRE-CREATION flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/wal/WALReplyResult.java PRE-CREATION flume-ng-channels/flume-file-channel/src/test/java/org/apache/flume/channel/file/TestFileChannel.java ab66998 flume-ng-channels/flume-file-channel/src/test/java/org/apache/flume/channel/file/wal/TestWAL.java PRE-CREATION Diff: https://reviews.apache.org/r/4325/diff Testing ------- Thanks, Brock
          Hide
          jiraposter@reviews.apache.org added a comment -

          On 2012-03-14 08:27:28, Juhani Connolly wrote:

          > This looks like a great start on a tough task...

          >

          > I'm unsure about one thing that doesn't seem to be covered by tests:

          >

          > When parallel threads each have their own transactions, there is no guarantee that FileChannel.commitSequenceId is called in seqid order. Some threads will already have committed some or all of their entries, but because of a lower commitSequenceId entry the entries will still be "revived" in replay.

          >

          > e.g. thread a and b do some puts/takes and commit

          >

          > action | seqId | lastSequenceId

          > a.put | 1

          > b.put | 2

          > b.put | 3

          > a.put | 4

          > b.take | | 3

          > a.commit

          > b.commit

          >

          > at this point the sequenceId file would just contain 3

          > events would contain 1,2,3,4

          > channel a would contain 1,4

          > channel b would contain 2

          >

          > Now the process dies, we bring it back, replay the WAL:

          >

          > - replaying the sequenceId file(s) would detect 3 as the oldest commitSequenceId

          > - only entry 4 would be > 3 and thus restored to entries

          > - entries 1,2 which were never take()d have gone missing.

          >

          > Does this fall within the purview of best effort delivery(so it is acceptable to lose these since we are not guaranteeing delivery)?

          > Or am I missing something?

          Juhani Connolly wrote:

          I was going to write a unit test but there's no mechanism to force a WAL write. While changing implementation for testing isn't good, I think it may make sense to roll the wall when doClose is called on FileChannel.

          Juhani Connolly wrote:

          Also, I got the example wrong(because of the order take's are made in)... the lastSequenceId should be 2, resulting in 3,4, getting recovered but 1 still missing.

          With that being said, have you considered treating take's and put's as separate entries in the wal, with commit id's assigned to them at commit time. You can then periodically "compact" the wal by replaying its contents, removing matched takes/puts(you may want to look at how this is done in HBase)?

          Yep I think you are correct, say we have two threads doing takes. Thread 1 does a take and gets Event(SeqID=5), Thread 2 does a take and gets Events(SeqID=6). Thread 2 commits and Thread 1 rollback. At this point Event(SeqID=5) is stored only in memory as it will not get replayed if the process should restart.

          • Brock

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/4325/#review5926
          -----------------------------------------------------------

          On 2012-03-14 01:41:47, Brock Noland wrote:

          -----------------------------------------------------------

          This is an automatically generated e-mail. To reply, visit:

          https://reviews.apache.org/r/4325/

          -----------------------------------------------------------

          (Updated 2012-03-14 01:41:47)

          Review request for Flume.

          Summary

          -------

          Attached patch not meant for commit. Just posting here for easy review.

          This addresses bug FLUME-896.

          https://issues.apache.org/jira/browse/FLUME-896

          Diffs

          -----

          flume-ng-channels/flume-file-channel/pom.xml ee2d20f

          flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/FileChannel.java a279453

          flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/FileChannelEvent.java PRE-CREATION

          flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/wal/WAL.java PRE-CREATION

          flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/wal/WALDataFile.java PRE-CREATION

          flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/wal/WALEntry.java PRE-CREATION

          flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/wal/WALReplyResult.java PRE-CREATION

          flume-ng-channels/flume-file-channel/src/test/java/org/apache/flume/channel/file/TestFileChannel.java ab66998

          flume-ng-channels/flume-file-channel/src/test/java/org/apache/flume/channel/file/wal/TestWAL.java PRE-CREATION

          Diff: https://reviews.apache.org/r/4325/diff

          Testing

          -------

          Thanks,

          Brock

          Show
          jiraposter@reviews.apache.org added a comment - On 2012-03-14 08:27:28, Juhani Connolly wrote: > This looks like a great start on a tough task... > > I'm unsure about one thing that doesn't seem to be covered by tests: > > When parallel threads each have their own transactions, there is no guarantee that FileChannel.commitSequenceId is called in seqid order. Some threads will already have committed some or all of their entries, but because of a lower commitSequenceId entry the entries will still be "revived" in replay. > > e.g. thread a and b do some puts/takes and commit > > action | seqId | lastSequenceId > a.put | 1 > b.put | 2 > b.put | 3 > a.put | 4 > b.take | | 3 > a.commit > b.commit > > at this point the sequenceId file would just contain 3 > events would contain 1,2,3,4 > channel a would contain 1,4 > channel b would contain 2 > > Now the process dies, we bring it back, replay the WAL: > > - replaying the sequenceId file(s) would detect 3 as the oldest commitSequenceId > - only entry 4 would be > 3 and thus restored to entries > - entries 1,2 which were never take()d have gone missing. > > Does this fall within the purview of best effort delivery(so it is acceptable to lose these since we are not guaranteeing delivery)? > Or am I missing something? Juhani Connolly wrote: I was going to write a unit test but there's no mechanism to force a WAL write. While changing implementation for testing isn't good, I think it may make sense to roll the wall when doClose is called on FileChannel. Juhani Connolly wrote: Also, I got the example wrong(because of the order take's are made in)... the lastSequenceId should be 2, resulting in 3,4, getting recovered but 1 still missing. With that being said, have you considered treating take's and put's as separate entries in the wal, with commit id's assigned to them at commit time. You can then periodically "compact" the wal by replaying its contents, removing matched takes/puts(you may want to look at how this is done in HBase)? Yep I think you are correct, say we have two threads doing takes. Thread 1 does a take and gets Event(SeqID=5), Thread 2 does a take and gets Events(SeqID=6). Thread 2 commits and Thread 1 rollback. At this point Event(SeqID=5) is stored only in memory as it will not get replayed if the process should restart. Brock ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4325/#review5926 ----------------------------------------------------------- On 2012-03-14 01:41:47, Brock Noland wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4325/ ----------------------------------------------------------- (Updated 2012-03-14 01:41:47) Review request for Flume. Summary ------- Attached patch not meant for commit. Just posting here for easy review. This addresses bug FLUME-896 . https://issues.apache.org/jira/browse/FLUME-896 Diffs ----- flume-ng-channels/flume-file-channel/pom.xml ee2d20f flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/FileChannel.java a279453 flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/FileChannelEvent.java PRE-CREATION flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/wal/WAL.java PRE-CREATION flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/wal/WALDataFile.java PRE-CREATION flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/wal/WALEntry.java PRE-CREATION flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/wal/WALReplyResult.java PRE-CREATION flume-ng-channels/flume-file-channel/src/test/java/org/apache/flume/channel/file/TestFileChannel.java ab66998 flume-ng-channels/flume-file-channel/src/test/java/org/apache/flume/channel/file/wal/TestWAL.java PRE-CREATION Diff: https://reviews.apache.org/r/4325/diff Testing ------- Thanks, Brock
          Hide
          jiraposter@reviews.apache.org added a comment -

          On 2012-03-14 08:27:28, Juhani Connolly wrote:

          > This looks like a great start on a tough task...

          >

          > I'm unsure about one thing that doesn't seem to be covered by tests:

          >

          > When parallel threads each have their own transactions, there is no guarantee that FileChannel.commitSequenceId is called in seqid order. Some threads will already have committed some or all of their entries, but because of a lower commitSequenceId entry the entries will still be "revived" in replay.

          >

          > e.g. thread a and b do some puts/takes and commit

          >

          > action | seqId | lastSequenceId

          > a.put | 1

          > b.put | 2

          > b.put | 3

          > a.put | 4

          > b.take | | 3

          > a.commit

          > b.commit

          >

          > at this point the sequenceId file would just contain 3

          > events would contain 1,2,3,4

          > channel a would contain 1,4

          > channel b would contain 2

          >

          > Now the process dies, we bring it back, replay the WAL:

          >

          > - replaying the sequenceId file(s) would detect 3 as the oldest commitSequenceId

          > - only entry 4 would be > 3 and thus restored to entries

          > - entries 1,2 which were never take()d have gone missing.

          >

          > Does this fall within the purview of best effort delivery(so it is acceptable to lose these since we are not guaranteeing delivery)?

          > Or am I missing something?

          Juhani Connolly wrote:

          I was going to write a unit test but there's no mechanism to force a WAL write. While changing implementation for testing isn't good, I think it may make sense to roll the wall when doClose is called on FileChannel.

          Also, I got the example wrong(because of the order take's are made in)... the lastSequenceId should be 2, resulting in 3,4, getting recovered but 1 still missing.

          With that being said, have you considered treating take's and put's as separate entries in the wal, with commit id's assigned to them at commit time. You can then periodically "compact" the wal by replaying its contents, removing matched takes/puts(you may want to look at how this is done in HBase)?

          • Juhani

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/4325/#review5926
          -----------------------------------------------------------

          On 2012-03-14 01:41:47, Brock Noland wrote:

          -----------------------------------------------------------

          This is an automatically generated e-mail. To reply, visit:

          https://reviews.apache.org/r/4325/

          -----------------------------------------------------------

          (Updated 2012-03-14 01:41:47)

          Review request for Flume.

          Summary

          -------

          Attached patch not meant for commit. Just posting here for easy review.

          This addresses bug FLUME-896.

          https://issues.apache.org/jira/browse/FLUME-896

          Diffs

          -----

          flume-ng-channels/flume-file-channel/pom.xml ee2d20f

          flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/FileChannel.java a279453

          flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/FileChannelEvent.java PRE-CREATION

          flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/wal/WAL.java PRE-CREATION

          flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/wal/WALDataFile.java PRE-CREATION

          flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/wal/WALEntry.java PRE-CREATION

          flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/wal/WALReplyResult.java PRE-CREATION

          flume-ng-channels/flume-file-channel/src/test/java/org/apache/flume/channel/file/TestFileChannel.java ab66998

          flume-ng-channels/flume-file-channel/src/test/java/org/apache/flume/channel/file/wal/TestWAL.java PRE-CREATION

          Diff: https://reviews.apache.org/r/4325/diff

          Testing

          -------

          Thanks,

          Brock

          Show
          jiraposter@reviews.apache.org added a comment - On 2012-03-14 08:27:28, Juhani Connolly wrote: > This looks like a great start on a tough task... > > I'm unsure about one thing that doesn't seem to be covered by tests: > > When parallel threads each have their own transactions, there is no guarantee that FileChannel.commitSequenceId is called in seqid order. Some threads will already have committed some or all of their entries, but because of a lower commitSequenceId entry the entries will still be "revived" in replay. > > e.g. thread a and b do some puts/takes and commit > > action | seqId | lastSequenceId > a.put | 1 > b.put | 2 > b.put | 3 > a.put | 4 > b.take | | 3 > a.commit > b.commit > > at this point the sequenceId file would just contain 3 > events would contain 1,2,3,4 > channel a would contain 1,4 > channel b would contain 2 > > Now the process dies, we bring it back, replay the WAL: > > - replaying the sequenceId file(s) would detect 3 as the oldest commitSequenceId > - only entry 4 would be > 3 and thus restored to entries > - entries 1,2 which were never take()d have gone missing. > > Does this fall within the purview of best effort delivery(so it is acceptable to lose these since we are not guaranteeing delivery)? > Or am I missing something? Juhani Connolly wrote: I was going to write a unit test but there's no mechanism to force a WAL write. While changing implementation for testing isn't good, I think it may make sense to roll the wall when doClose is called on FileChannel. Also, I got the example wrong(because of the order take's are made in)... the lastSequenceId should be 2, resulting in 3,4, getting recovered but 1 still missing. With that being said, have you considered treating take's and put's as separate entries in the wal, with commit id's assigned to them at commit time. You can then periodically "compact" the wal by replaying its contents, removing matched takes/puts(you may want to look at how this is done in HBase)? Juhani ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4325/#review5926 ----------------------------------------------------------- On 2012-03-14 01:41:47, Brock Noland wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4325/ ----------------------------------------------------------- (Updated 2012-03-14 01:41:47) Review request for Flume. Summary ------- Attached patch not meant for commit. Just posting here for easy review. This addresses bug FLUME-896 . https://issues.apache.org/jira/browse/FLUME-896 Diffs ----- flume-ng-channels/flume-file-channel/pom.xml ee2d20f flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/FileChannel.java a279453 flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/FileChannelEvent.java PRE-CREATION flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/wal/WAL.java PRE-CREATION flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/wal/WALDataFile.java PRE-CREATION flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/wal/WALEntry.java PRE-CREATION flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/wal/WALReplyResult.java PRE-CREATION flume-ng-channels/flume-file-channel/src/test/java/org/apache/flume/channel/file/TestFileChannel.java ab66998 flume-ng-channels/flume-file-channel/src/test/java/org/apache/flume/channel/file/wal/TestWAL.java PRE-CREATION Diff: https://reviews.apache.org/r/4325/diff Testing ------- Thanks, Brock
          Hide
          jiraposter@reviews.apache.org added a comment -

          On 2012-03-14 08:27:28, Juhani Connolly wrote:

          > This looks like a great start on a tough task...

          >

          > I'm unsure about one thing that doesn't seem to be covered by tests:

          >

          > When parallel threads each have their own transactions, there is no guarantee that FileChannel.commitSequenceId is called in seqid order. Some threads will already have committed some or all of their entries, but because of a lower commitSequenceId entry the entries will still be "revived" in replay.

          >

          > e.g. thread a and b do some puts/takes and commit

          >

          > action | seqId | lastSequenceId

          > a.put | 1

          > b.put | 2

          > b.put | 3

          > a.put | 4

          > b.take | | 3

          > a.commit

          > b.commit

          >

          > at this point the sequenceId file would just contain 3

          > events would contain 1,2,3,4

          > channel a would contain 1,4

          > channel b would contain 2

          >

          > Now the process dies, we bring it back, replay the WAL:

          >

          > - replaying the sequenceId file(s) would detect 3 as the oldest commitSequenceId

          > - only entry 4 would be > 3 and thus restored to entries

          > - entries 1,2 which were never take()d have gone missing.

          >

          > Does this fall within the purview of best effort delivery(so it is acceptable to lose these since we are not guaranteeing delivery)?

          > Or am I missing something?

          I was going to write a unit test but there's no mechanism to force a WAL write. While changing implementation for testing isn't good, I think it may make sense to roll the wall when doClose is called on FileChannel.

          • Juhani

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/4325/#review5926
          -----------------------------------------------------------

          On 2012-03-14 01:41:47, Brock Noland wrote:

          -----------------------------------------------------------

          This is an automatically generated e-mail. To reply, visit:

          https://reviews.apache.org/r/4325/

          -----------------------------------------------------------

          (Updated 2012-03-14 01:41:47)

          Review request for Flume.

          Summary

          -------

          Attached patch not meant for commit. Just posting here for easy review.

          This addresses bug FLUME-896.

          https://issues.apache.org/jira/browse/FLUME-896

          Diffs

          -----

          flume-ng-channels/flume-file-channel/pom.xml ee2d20f

          flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/FileChannel.java a279453

          flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/FileChannelEvent.java PRE-CREATION

          flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/wal/WAL.java PRE-CREATION

          flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/wal/WALDataFile.java PRE-CREATION

          flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/wal/WALEntry.java PRE-CREATION

          flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/wal/WALReplyResult.java PRE-CREATION

          flume-ng-channels/flume-file-channel/src/test/java/org/apache/flume/channel/file/TestFileChannel.java ab66998

          flume-ng-channels/flume-file-channel/src/test/java/org/apache/flume/channel/file/wal/TestWAL.java PRE-CREATION

          Diff: https://reviews.apache.org/r/4325/diff

          Testing

          -------

          Thanks,

          Brock

          Show
          jiraposter@reviews.apache.org added a comment - On 2012-03-14 08:27:28, Juhani Connolly wrote: > This looks like a great start on a tough task... > > I'm unsure about one thing that doesn't seem to be covered by tests: > > When parallel threads each have their own transactions, there is no guarantee that FileChannel.commitSequenceId is called in seqid order. Some threads will already have committed some or all of their entries, but because of a lower commitSequenceId entry the entries will still be "revived" in replay. > > e.g. thread a and b do some puts/takes and commit > > action | seqId | lastSequenceId > a.put | 1 > b.put | 2 > b.put | 3 > a.put | 4 > b.take | | 3 > a.commit > b.commit > > at this point the sequenceId file would just contain 3 > events would contain 1,2,3,4 > channel a would contain 1,4 > channel b would contain 2 > > Now the process dies, we bring it back, replay the WAL: > > - replaying the sequenceId file(s) would detect 3 as the oldest commitSequenceId > - only entry 4 would be > 3 and thus restored to entries > - entries 1,2 which were never take()d have gone missing. > > Does this fall within the purview of best effort delivery(so it is acceptable to lose these since we are not guaranteeing delivery)? > Or am I missing something? I was going to write a unit test but there's no mechanism to force a WAL write. While changing implementation for testing isn't good, I think it may make sense to roll the wall when doClose is called on FileChannel. Juhani ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4325/#review5926 ----------------------------------------------------------- On 2012-03-14 01:41:47, Brock Noland wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4325/ ----------------------------------------------------------- (Updated 2012-03-14 01:41:47) Review request for Flume. Summary ------- Attached patch not meant for commit. Just posting here for easy review. This addresses bug FLUME-896 . https://issues.apache.org/jira/browse/FLUME-896 Diffs ----- flume-ng-channels/flume-file-channel/pom.xml ee2d20f flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/FileChannel.java a279453 flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/FileChannelEvent.java PRE-CREATION flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/wal/WAL.java PRE-CREATION flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/wal/WALDataFile.java PRE-CREATION flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/wal/WALEntry.java PRE-CREATION flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/wal/WALReplyResult.java PRE-CREATION flume-ng-channels/flume-file-channel/src/test/java/org/apache/flume/channel/file/TestFileChannel.java ab66998 flume-ng-channels/flume-file-channel/src/test/java/org/apache/flume/channel/file/wal/TestWAL.java PRE-CREATION Diff: https://reviews.apache.org/r/4325/diff Testing ------- Thanks, Brock
          Hide
          jiraposter@reviews.apache.org added a comment -

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/4325/#review5926
          -----------------------------------------------------------

          This looks like a great start on a tough task...

          I'm unsure about one thing that doesn't seem to be covered by tests:

          When parallel threads each have their own transactions, there is no guarantee that FileChannel.commitSequenceId is called in seqid order. Some threads will already have committed some or all of their entries, but because of a lower commitSequenceId entry the entries will still be "revived" in replay.

          e.g. thread a and b do some puts/takes and commit

          action | seqId | lastSequenceId
          a.put | 1
          b.put | 2
          b.put | 3
          a.put | 4
          b.take | | 3
          a.commit
          b.commit

          at this point the sequenceId file would just contain 3
          events would contain 1,2,3,4
          channel a would contain 1,4
          channel b would contain 2

          Now the process dies, we bring it back, replay the WAL:

          • replaying the sequenceId file(s) would detect 3 as the oldest commitSequenceId
          • only entry 4 would be > 3 and thus restored to entries
          • entries 1,2 which were never take()d have gone missing.

          Does this fall within the purview of best effort delivery(so it is acceptable to lose these since we are not guaranteeing delivery)?
          Or am I missing something?

          • Juhani

          On 2012-03-14 01:41:47, Brock Noland wrote:

          -----------------------------------------------------------

          This is an automatically generated e-mail. To reply, visit:

          https://reviews.apache.org/r/4325/

          -----------------------------------------------------------

          (Updated 2012-03-14 01:41:47)

          Review request for Flume.

          Summary

          -------

          Attached patch not meant for commit. Just posting here for easy review.

          This addresses bug FLUME-896.

          https://issues.apache.org/jira/browse/FLUME-896

          Diffs

          -----

          flume-ng-channels/flume-file-channel/pom.xml ee2d20f

          flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/FileChannel.java a279453

          flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/FileChannelEvent.java PRE-CREATION

          flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/wal/WAL.java PRE-CREATION

          flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/wal/WALDataFile.java PRE-CREATION

          flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/wal/WALEntry.java PRE-CREATION

          flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/wal/WALReplyResult.java PRE-CREATION

          flume-ng-channels/flume-file-channel/src/test/java/org/apache/flume/channel/file/TestFileChannel.java ab66998

          flume-ng-channels/flume-file-channel/src/test/java/org/apache/flume/channel/file/wal/TestWAL.java PRE-CREATION

          Diff: https://reviews.apache.org/r/4325/diff

          Testing

          -------

          Thanks,

          Brock

          Show
          jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4325/#review5926 ----------------------------------------------------------- This looks like a great start on a tough task... I'm unsure about one thing that doesn't seem to be covered by tests: When parallel threads each have their own transactions, there is no guarantee that FileChannel.commitSequenceId is called in seqid order. Some threads will already have committed some or all of their entries, but because of a lower commitSequenceId entry the entries will still be "revived" in replay. e.g. thread a and b do some puts/takes and commit action | seqId | lastSequenceId a.put | 1 b.put | 2 b.put | 3 a.put | 4 b.take | | 3 a.commit b.commit at this point the sequenceId file would just contain 3 events would contain 1,2,3,4 channel a would contain 1,4 channel b would contain 2 Now the process dies, we bring it back, replay the WAL: replaying the sequenceId file(s) would detect 3 as the oldest commitSequenceId only entry 4 would be > 3 and thus restored to entries entries 1,2 which were never take()d have gone missing. Does this fall within the purview of best effort delivery(so it is acceptable to lose these since we are not guaranteeing delivery)? Or am I missing something? Juhani On 2012-03-14 01:41:47, Brock Noland wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4325/ ----------------------------------------------------------- (Updated 2012-03-14 01:41:47) Review request for Flume. Summary ------- Attached patch not meant for commit. Just posting here for easy review. This addresses bug FLUME-896 . https://issues.apache.org/jira/browse/FLUME-896 Diffs ----- flume-ng-channels/flume-file-channel/pom.xml ee2d20f flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/FileChannel.java a279453 flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/FileChannelEvent.java PRE-CREATION flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/wal/WAL.java PRE-CREATION flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/wal/WALDataFile.java PRE-CREATION flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/wal/WALEntry.java PRE-CREATION flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/wal/WALReplyResult.java PRE-CREATION flume-ng-channels/flume-file-channel/src/test/java/org/apache/flume/channel/file/TestFileChannel.java ab66998 flume-ng-channels/flume-file-channel/src/test/java/org/apache/flume/channel/file/wal/TestWAL.java PRE-CREATION Diff: https://reviews.apache.org/r/4325/diff Testing ------- Thanks, Brock
          Hide
          Brock Noland added a comment -

          Attached patch is not meant for commit. I put it on RB so it'd be easy to review.

          I was curious to see if I could do this so I wrote the attached. Seems to work well but hasn't been looked at by anyone.

          Show
          Brock Noland added a comment - Attached patch is not meant for commit. I put it on RB so it'd be easy to review. I was curious to see if I could do this so I wrote the attached. Seems to work well but hasn't been looked at by anyone.
          Hide
          jiraposter@reviews.apache.org added a comment -

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/4325/
          -----------------------------------------------------------

          Review request for Flume.

          Summary
          -------

          Attached patch not meant for commit. Just posting here for easy review.

          This addresses bug FLUME-896.
          https://issues.apache.org/jira/browse/FLUME-896

          Diffs


          flume-ng-channels/flume-file-channel/pom.xml ee2d20f
          flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/FileChannel.java a279453
          flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/FileChannelEvent.java PRE-CREATION
          flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/wal/WAL.java PRE-CREATION
          flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/wal/WALDataFile.java PRE-CREATION
          flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/wal/WALEntry.java PRE-CREATION
          flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/wal/WALReplyResult.java PRE-CREATION
          flume-ng-channels/flume-file-channel/src/test/java/org/apache/flume/channel/file/TestFileChannel.java ab66998
          flume-ng-channels/flume-file-channel/src/test/java/org/apache/flume/channel/file/wal/TestWAL.java PRE-CREATION

          Diff: https://reviews.apache.org/r/4325/diff

          Testing
          -------

          Thanks,

          Brock

          Show
          jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4325/ ----------------------------------------------------------- Review request for Flume. Summary ------- Attached patch not meant for commit. Just posting here for easy review. This addresses bug FLUME-896 . https://issues.apache.org/jira/browse/FLUME-896 Diffs flume-ng-channels/flume-file-channel/pom.xml ee2d20f flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/FileChannel.java a279453 flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/FileChannelEvent.java PRE-CREATION flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/wal/WAL.java PRE-CREATION flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/wal/WALDataFile.java PRE-CREATION flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/wal/WALEntry.java PRE-CREATION flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/wal/WALReplyResult.java PRE-CREATION flume-ng-channels/flume-file-channel/src/test/java/org/apache/flume/channel/file/TestFileChannel.java ab66998 flume-ng-channels/flume-file-channel/src/test/java/org/apache/flume/channel/file/wal/TestWAL.java PRE-CREATION Diff: https://reviews.apache.org/r/4325/diff Testing ------- Thanks, Brock
          Hide
          Juhani Connolly added a comment -

          Removing the patch available flag to clear up current status

          Show
          Juhani Connolly added a comment - Removing the patch available flag to clear up current status
          Hide
          Peter Newcomb added a comment -

          Reverting my inappropriate reassignment of the issue.

          Show
          Peter Newcomb added a comment - Reverting my inappropriate reassignment of the issue.
          Hide
          Peter Newcomb added a comment -

          Sorry, I didn't mean to offend. I was trying to follow the instructions on the how-to-contribute[1] page, but failed to recognize the difference between assigning a JIRA to myself and REassigning a JIRA to myself.

          I did attempt to describe what I was doing as early as I could in the comments above (though I'm sure I could have done better), but as I was primarily focused on meeting an internal deadline I could not wait to write code until after public discussion. I'm perfectly OK with my code and even approach to be rejected as either inadequate or simply not desired, but wanted to share in case it was useful. I'm also happy to shelve this and join in discussion on a FEP on the subject as you suggest.

          Anyway, the code I posted does actually guarantee sequential access, though some interleaving may occur if there are multiple sinks reading from the channel simultaneously.

          -peter

          [1] https://cwiki.apache.org/FLUME/how-to-contribute.html

          Show
          Peter Newcomb added a comment - Sorry, I didn't mean to offend. I was trying to follow the instructions on the how-to-contribute [1] page, but failed to recognize the difference between assigning a JIRA to myself and REassigning a JIRA to myself. I did attempt to describe what I was doing as early as I could in the comments above (though I'm sure I could have done better), but as I was primarily focused on meeting an internal deadline I could not wait to write code until after public discussion. I'm perfectly OK with my code and even approach to be rejected as either inadequate or simply not desired, but wanted to share in case it was useful. I'm also happy to shelve this and join in discussion on a FEP on the subject as you suggest. Anyway, the code I posted does actually guarantee sequential access, though some interleaving may occur if there are multiple sinks reading from the channel simultaneously. -peter [1] https://cwiki.apache.org/FLUME/how-to-contribute.html
          Hide
          E. Sammer added a comment -

          I really don't think the implementation as individual files is going to fly. We're basically turning guaranteed sequential access into random access both on write and read. I realize I haven't had the time to complete this work so I'd like to turn a design of this into a FEP[1] that we all agree on and then let whoever wants to hack on it do so. I also have all my work thus far on a branch which I'm happy to share.

          Peter, fwiw, it's generally considered nice to talk to someone before reassigning a JIRA.

          [1] https://cwiki.apache.org/confluence/display/FLUME/Flume+Enhancement+Proposals

          Show
          E. Sammer added a comment - I really don't think the implementation as individual files is going to fly. We're basically turning guaranteed sequential access into random access both on write and read. I realize I haven't had the time to complete this work so I'd like to turn a design of this into a FEP [1] that we all agree on and then let whoever wants to hack on it do so. I also have all my work thus far on a branch which I'm happy to share. Peter, fwiw, it's generally considered nice to talk to someone before reassigning a JIRA. [1] https://cwiki.apache.org/confluence/display/FLUME/Flume+Enhancement+Proposals
          Hide
          Peter Newcomb added a comment -

          This dependency is not intrinsic, it's just that the proposed patch happens to depend on the patch I provided for FLUME-935

          Show
          Peter Newcomb added a comment - This dependency is not intrinsic, it's just that the proposed patch happens to depend on the patch I provided for FLUME-935
          Hide
          jiraposter@reviews.apache.org added a comment -

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/3520/
          -----------------------------------------------------------

          Review request for Flume.

          Summary
          -------

          I've filled in the existing FileChannel skeleton, but have simplified it in that I'm persisting each event as its own file in order to avoid problems with asynchronous source and sink transaction boundaries. There are 4 subdirectories that each file transitions among: writing, complete, reading, and removed. Within each subdirectory, event files are contained within transaction subdirectories, so that whole transactions can be moved from one state to another atomically (at least to the limits of the underlying filesystem).

          While it will work in isolation, it is intended to support high volume via external batching and unbatching of events as I have described in an earlier comment in FLUME-896.

          This patch also depends on the patch I submitted for FLUME-935.

          This addresses bug FLUME-896.
          https://issues.apache.org/jira/browse/FLUME-896

          Diffs


          /branches/flume-728/flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/FileChannel.java 1232571
          /branches/flume-728/flume-ng-channels/flume-file-channel/src/test/java/org/apache/flume/channel/file/TestFileChannel.java 1232571

          Diff: https://reviews.apache.org/r/3520/diff

          Testing
          -------

          I've included a few new unit tests (though the coverage is still mostly happy-path), and this code has also been integration-tested as part of a production release, including various failure modes both contrived and real.

          Thanks,

          Peter

          Show
          jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/3520/ ----------------------------------------------------------- Review request for Flume. Summary ------- I've filled in the existing FileChannel skeleton, but have simplified it in that I'm persisting each event as its own file in order to avoid problems with asynchronous source and sink transaction boundaries. There are 4 subdirectories that each file transitions among: writing, complete, reading, and removed. Within each subdirectory, event files are contained within transaction subdirectories, so that whole transactions can be moved from one state to another atomically (at least to the limits of the underlying filesystem). While it will work in isolation, it is intended to support high volume via external batching and unbatching of events as I have described in an earlier comment in FLUME-896 . This patch also depends on the patch I submitted for FLUME-935 . This addresses bug FLUME-896 . https://issues.apache.org/jira/browse/FLUME-896 Diffs /branches/flume-728/flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/FileChannel.java 1232571 /branches/flume-728/flume-ng-channels/flume-file-channel/src/test/java/org/apache/flume/channel/file/TestFileChannel.java 1232571 Diff: https://reviews.apache.org/r/3520/diff Testing ------- I've included a few new unit tests (though the coverage is still mostly happy-path), and this code has also been integration-tested as part of a production release, including various failure modes both contrived and real. Thanks, Peter
          Hide
          Peter Newcomb added a comment -

          Hi Praveen,

          Thanks for the comment. It is true that this code does not attempt to
          address all of those concerns--due to conscious choice on my part. In
          what I gather is the spirit of Flume NG, I implemented this as a
          simple component designed to be combined with other components in
          order to address a broad range of use cases, rather than attempting to
          address those use cases directly from within the component itself.
          Also, as just one of many implementations of Channel, it need not be
          the only way of temporarily storing events in a filesystem.

          Your comment makes me realize, however, that I have failed to describe
          the rest of the sort of context in which I see this implementation
          being used. Let me sketch out how I'm actually using it. At the
          "agent" tier:

          SOURCE | PseudoTxnMemoryChannel | BatchCollector | FileChannel | AvroSink

          And at the "collector" tier:

          AvroSource | FileChannel | BatchSplitter | SynchronousChannel | HDFSEventSink

          I've embedded the "agent" pipeline into our existing service (a
          specialized web server for collecting data through tracking pixel
          hits), so SOURCE is really just that service directly calling put() on
          the PseudoTxnMemoryChannel.

          BatchCollector and BatchSplitter are effectively both sinks and
          sources, though they officially implement the Sink interface, not the
          Source interface. The do what their names imply: the first collects a
          set of events as a batch, emitting a single event whose body contains
          the entire set of events, while the second takes a batch event created
          by the first and re-emits each individual event. The batches may be
          compressed if desired, and transaction semantics are adhered to
          throughout.

          SynchronousChannel is a special in-memory channel based on
          SynchronousQueue that is designed to safely mediate between sources
          and sinks that may have differing transaction boundaries (i.e., number
          of events/transaction). Unlike MemoryChannel, SynchronousChannel does
          not allow a source's transaction to succeed unless the events put()
          during that transaction have not only been take()n by a sink, but that
          the sink's transaction has fully committed.

          FYI, I do intend to contribute all of these components, but have been
          limited by how much time I can spend on contribution efforts. If
          there's interest I'll try to do it sooner rather than later.

          As to the point about porting implementations from other projects:
          while it is true that I have not copied an implementation from another
          OS project, I have poured into this implementation many years of
          experience implementing Flume-like systems, and its mechanisms are
          ones that are tried and true, at least when combined with the batching
          mechanism I descibed above.

          All of that said, it may simply be that this is not the implementation
          envisioned for the Channel implementation named "FileChannel", which
          is perfectly fine... I'm not wed to this implementation being named
          anything in particular, nor even to it be adopted by Flume at all--I
          created it only because the existing JDBC channel is too heavyweight
          for our particular application.

          -peter

          Show
          Peter Newcomb added a comment - Hi Praveen, Thanks for the comment. It is true that this code does not attempt to address all of those concerns--due to conscious choice on my part. In what I gather is the spirit of Flume NG, I implemented this as a simple component designed to be combined with other components in order to address a broad range of use cases, rather than attempting to address those use cases directly from within the component itself. Also, as just one of many implementations of Channel, it need not be the only way of temporarily storing events in a filesystem. Your comment makes me realize, however, that I have failed to describe the rest of the sort of context in which I see this implementation being used. Let me sketch out how I'm actually using it. At the "agent" tier: SOURCE | PseudoTxnMemoryChannel | BatchCollector | FileChannel | AvroSink And at the "collector" tier: AvroSource | FileChannel | BatchSplitter | SynchronousChannel | HDFSEventSink I've embedded the "agent" pipeline into our existing service (a specialized web server for collecting data through tracking pixel hits), so SOURCE is really just that service directly calling put() on the PseudoTxnMemoryChannel. BatchCollector and BatchSplitter are effectively both sinks and sources, though they officially implement the Sink interface, not the Source interface. The do what their names imply: the first collects a set of events as a batch, emitting a single event whose body contains the entire set of events, while the second takes a batch event created by the first and re-emits each individual event. The batches may be compressed if desired, and transaction semantics are adhered to throughout. SynchronousChannel is a special in-memory channel based on SynchronousQueue that is designed to safely mediate between sources and sinks that may have differing transaction boundaries (i.e., number of events/transaction). Unlike MemoryChannel, SynchronousChannel does not allow a source's transaction to succeed unless the events put() during that transaction have not only been take()n by a sink, but that the sink's transaction has fully committed. FYI, I do intend to contribute all of these components, but have been limited by how much time I can spend on contribution efforts. If there's interest I'll try to do it sooner rather than later. As to the point about porting implementations from other projects: while it is true that I have not copied an implementation from another OS project, I have poured into this implementation many years of experience implementing Flume-like systems, and its mechanisms are ones that are tried and true, at least when combined with the batching mechanism I descibed above. All of that said, it may simply be that this is not the implementation envisioned for the Channel implementation named "FileChannel", which is perfectly fine... I'm not wed to this implementation being named anything in particular, nor even to it be adopted by Flume at all--I created it only because the existing JDBC channel is too heavyweight for our particular application. -peter
          Hide
          Praveen Ramachandra added a comment -

          Hi Peter,

          I am not sure if this is the best approach for a system that has to deal with lots and lots of data.

          Some of the things we need to validate against

          1. What if the event source produce an event at a time, essentially transaction boundary is one event
          2. What is the system behavior if 99.99% of the transactions doesn't do a rollback i.e., actually does the commit
          3. How do we support applications who need a relaxed guarantees as opposed to complete guarantees like a database system.

          On the contrary, some questions that we could ask ourselves is
          1. Why cant we store the uncommitted data in memory. If the server (i.e., flume node) goes down then only uncommitted data is lost and hence is not an issue. Application can deal with such scenarios
          2. Even if we persist uncommitted data, how does a source request to resurrect a transaction once a node comes back up. There is no such thing as "oh! this is the addressable handle of my last transaction, please continue from where we left off". Even if we can build such complex semantics, important question is do we really need to?

          As I have said in the past please (please) look at current implementations that solve the same problem and port the working implementation (e.g., scribe, kafka). We don't have the baggage of JMS, JDBC etc., and we can build the reliability and keep it tunable for individual application needs.

          Hope this helps.


          Regards,
          Praveen Ramachandra

          Show
          Praveen Ramachandra added a comment - Hi Peter, I am not sure if this is the best approach for a system that has to deal with lots and lots of data. Some of the things we need to validate against 1. What if the event source produce an event at a time, essentially transaction boundary is one event 2. What is the system behavior if 99.99% of the transactions doesn't do a rollback i.e., actually does the commit 3. How do we support applications who need a relaxed guarantees as opposed to complete guarantees like a database system. On the contrary, some questions that we could ask ourselves is 1. Why cant we store the uncommitted data in memory. If the server (i.e., flume node) goes down then only uncommitted data is lost and hence is not an issue. Application can deal with such scenarios 2. Even if we persist uncommitted data, how does a source request to resurrect a transaction once a node comes back up. There is no such thing as "oh! this is the addressable handle of my last transaction, please continue from where we left off". Even if we can build such complex semantics, important question is do we really need to? As I have said in the past please (please) look at current implementations that solve the same problem and port the working implementation (e.g., scribe, kafka). We don't have the baggage of JMS, JDBC etc., and we can build the reliability and keep it tunable for individual application needs. Hope this helps. – Regards, Praveen Ramachandra
          Hide
          Peter Newcomb added a comment -

          I've filled in the existing FileChannel skeleton, but have simplified it in that I'm persisting each event as its own file in order to avoid problems with asynchronous source and sink transaction boundaries. There are 4 subdirectories that each file transitions among: writing, complete, reading, and removed. Within each subdirectory, event files are contained within transaction subdirectories, so that whole transactions can be moved from one state to another atomically (at least to the limits of the underlying filesystem).

          Though I have this code up and running and on its way towards production, I have not yet completed performance and load testing nor test cases, and hence have not posted the patch for review, though I hope to do so soon.

          Show
          Peter Newcomb added a comment - I've filled in the existing FileChannel skeleton, but have simplified it in that I'm persisting each event as its own file in order to avoid problems with asynchronous source and sink transaction boundaries. There are 4 subdirectories that each file transitions among: writing, complete, reading, and removed. Within each subdirectory, event files are contained within transaction subdirectories, so that whole transactions can be moved from one state to another atomically (at least to the limits of the underlying filesystem). Though I have this code up and running and on its way towards production, I have not yet completed performance and load testing nor test cases, and hence have not posted the patch for review, though I hope to do so soon.

            People

            • Assignee:
              Brock Noland
              Reporter:
              E. Sammer
            • Votes:
              1 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development