Flume
  1. Flume
  2. FLUME-1201

Create a buffer channel, that stores overflow from a fast, low capacity channel to a slower high capacity channel

    Details

    • Type: New Feature New Feature
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Not a Problem
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: Channel
    • Labels:
      None

      Description

      As it stands, users need to make a choice between either slower channels or faster ones. The current "middle ground" consists of RecoverableFileChannel which still stores everything to disk. MemoryChannel on the other hand has limited capacity or a high memory footprint.
      I propose to implement a buffer channel, somewhat like the buffer store in scribed.
      It would normally act as a proxy to the primary channel. Should this channel be unable to receive events(normally because it is at capacity, but perhaps some future channels may have other failure cases) it would switch states to buffering, storing new events to the secondary channel.
      In buffering, items continue to be read from the primary channel, and it attempts to "refill" itself from the secondary. Once the secondary is found to be empty, operation switches back to streaming mode, with items going directly to the primary.

      The main objective of this would thus be to have a high throughput channel as the primary mode of operation, allowing it to switch over when takes are not keeping up with puts.

        Activity

        Hide
        Juhani Connolly added a comment -

        The implementation of this would create a compound channel which at the moment the community feels may be too much work to support. Jarek has opened a separatate ticket, FLUME-1227 that addresses the immediate concern of this ticket with a channel that has the functionality fixed

        Show
        Juhani Connolly added a comment - The implementation of this would create a compound channel which at the moment the community feels may be too much work to support. Jarek has opened a separatate ticket, FLUME-1227 that addresses the immediate concern of this ticket with a channel that has the functionality fixed
        Hide
        Juhani Connolly added a comment -

        Ok, I'm going to close this in that case, seeing as we don't really seem to be too keen on allowing compound channels

        Show
        Juhani Connolly added a comment - Ok, I'm going to close this in that case, seeing as we don't really seem to be too keen on allowing compound channels
        Hide
        Jarek Jarcec Cecho added a comment -

        I've opened FLUME-1227 to cover the "SpillableChannel".

        Show
        Jarek Jarcec Cecho added a comment - I've opened FLUME-1227 to cover the "SpillableChannel".
        Hide
        Juhani Connolly added a comment -

        this works for me too.

        I do like the current buffer store in scribed, but the exact mechanics of it can certainly not be setup with flume.

        If we're happy with it not being configurable, limited to just memory/file, then we could just implement it as a separate channel, though I suspect there would be a fair bit of code repetition with existing channels, which in turn would make for more maintenance.

        The point about a compound channel quadratically increasing things to test for is valid... Sort of... Ultimately though, it should not need to be aware of the channels below it, and should only rely on the backing channels fulfilling their contract, testing it would involve testing that only it works as intended.

        Again though, if only having a memory channel spilling to file is enough, doing an explicit implementation of that could work.

        Show
        Juhani Connolly added a comment - this works for me too. I do like the current buffer store in scribed, but the exact mechanics of it can certainly not be setup with flume. If we're happy with it not being configurable, limited to just memory/file, then we could just implement it as a separate channel, though I suspect there would be a fair bit of code repetition with existing channels, which in turn would make for more maintenance. The point about a compound channel quadratically increasing things to test for is valid... Sort of... Ultimately though, it should not need to be aware of the channels below it, and should only rely on the backing channels fulfilling their contract, testing it would involve testing that only it works as intended. Again though, if only having a memory channel spilling to file is enough, doing an explicit implementation of that could work.
        Hide
        Arvind Prabhakar added a comment -

        Hi Jarcec, thanks for taking this up. To ensure that there is no confusion between the purpose of this Jira and what you are suggesting, I suggest you create a new Jira specifically for it and assign it to yourself.

        Also, it would be good to put a requirement that it should not depend upon any internal implementation details of other channels so as to not be tightly coupled with them.

        Show
        Arvind Prabhakar added a comment - Hi Jarcec, thanks for taking this up. To ensure that there is no confusion between the purpose of this Jira and what you are suggesting, I suggest you create a new Jira specifically for it and assign it to yourself. Also, it would be good to put a requirement that it should not depend upon any internal implementation details of other channels so as to not be tightly coupled with them.
        Hide
        Joe Crobak added a comment -

        I'm with Jarcec. A SpillableChannel is exactly what we'd like. Particularly in EC2, where IO performance is variable, it'd be nice to bypass disk IO whenever possible.

        Show
        Joe Crobak added a comment - I'm with Jarcec. A SpillableChannel is exactly what we'd like. Particularly in EC2, where IO performance is variable, it'd be nice to bypass disk IO whenever possible.
        Hide
        Jarek Jarcec Cecho added a comment -

        I'm also having concerns about compound channels. Current design is quite straightforward, easy to understand and (with comparison to flume-og) working. I'm afraid that having compound channels would introduce a lot of confusion in the user based and might lead to user issues and bad impression from flume in general.

        However the idea of a memory (=fast) channel that would spill data do disk in case that it would went full (because of issue on another agent down the road) occurred to me as well, because that's exactly what scribe is doing and what my current employer want to do I'm volunteering myself to write such "SpillableChannel" if we agree that it's acceptable.

        Jarcec

        Show
        Jarek Jarcec Cecho added a comment - I'm also having concerns about compound channels. Current design is quite straightforward, easy to understand and (with comparison to flume-og) working. I'm afraid that having compound channels would introduce a lot of confusion in the user based and might lead to user issues and bad impression from flume in general. However the idea of a memory (=fast) channel that would spill data do disk in case that it would went full (because of issue on another agent down the road) occurred to me as well, because that's exactly what scribe is doing and what my current employer want to do I'm volunteering myself to write such "SpillableChannel" if we agree that it's acceptable. Jarcec
        Hide
        Mike Percy added a comment -

        Just to clarify my previous comment: let's make sure we clearly define a contract for the durability guarantees of every channel that ships with Flume. i.e. what happens if I yank the power cord on a running system: do I lose all of the events in the channel?

        Show
        Mike Percy added a comment - Just to clarify my previous comment: let's make sure we clearly define a contract for the durability guarantees of every channel that ships with Flume. i.e. what happens if I yank the power cord on a running system: do I lose all of the events in the channel?
        Hide
        Mike Percy added a comment -

        Hey, sorry for not chiming in earlier. I have concerns about trying to make channels composable for the following reasons:

        1. Potential for a complex implementation and configuration
        2. It could confuse the durability guarantees of the compound channel, which today are very clear (either durable or not)
        3. It would quadratically increase the testing surface area (take the outer product of all channels and then make sure they work together)

        So from my perspective, a memory channel that spills to disk, while still potentially suffering from #2, would best be added as a separate channel type of its own.

        Best,
        Mike

        Show
        Mike Percy added a comment - Hey, sorry for not chiming in earlier. I have concerns about trying to make channels composable for the following reasons: 1. Potential for a complex implementation and configuration 2. It could confuse the durability guarantees of the compound channel, which today are very clear (either durable or not) 3. It would quadratically increase the testing surface area (take the outer product of all channels and then make sure they work together) So from my perspective, a memory channel that spills to disk, while still potentially suffering from #2, would best be added as a separate channel type of its own. Best, Mike
        Hide
        Juhani Connolly added a comment -

        Due to the current nature of channel instantiation in PropertiesFileConfigurationProvider, it seems "wrong" to instantiate the subchannels from the channel itself(which initially seemed the most viable solution).

        One solution would be to implement channelgroups like the existing sinkgroups, but personally I find this quite unwieldy, further complicating configuration.
        I'd prefer some other alternative and am open to ideas.
        Unfortunately I do not believe this could not be implemented as a channel selector(at least not one that maintains event order... If one was to make a failover channel selector with the sink pulling from both channels that would make for a rudimentary implementation, dependent on configuration).

        I'd be curious to hear other peoples thoughts

        Show
        Juhani Connolly added a comment - Due to the current nature of channel instantiation in PropertiesFileConfigurationProvider, it seems "wrong" to instantiate the subchannels from the channel itself(which initially seemed the most viable solution). One solution would be to implement channelgroups like the existing sinkgroups, but personally I find this quite unwieldy, further complicating configuration. I'd prefer some other alternative and am open to ideas. Unfortunately I do not believe this could not be implemented as a channel selector(at least not one that maintains event order... If one was to make a failover channel selector with the sink pulling from both channels that would make for a rudimentary implementation, dependent on configuration). I'd be curious to hear other peoples thoughts

          People

          • Assignee:
            Juhani Connolly
            Reporter:
            Juhani Connolly
          • Votes:
            1 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development