Details

    • Type: New Feature New Feature
    • Status: In Progress
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: Sinks+Sources
    • Labels:
      None

      Description

      Storm contrib project explains how to push/emit events to storm spout [1]. It would be nice to create a flume-storm sink to emit events to storm spout output collector [2]

      [1] https://github.com/nathanmarz/storm-contrib/tree/master/storm-scribe
      [2] https://github.com/nathanmarz/storm-contrib/blob/master/storm-scribe/src/jvm/storm/scribe/ScribeSpout.java

        Activity

        Hide
        Ravikumar Visweswara added a comment -

        Flume Spout can be found from below link
        https://github.com/ExpediaInc/edw-Storm-Flume-Connectors

        Show
        Ravikumar Visweswara added a comment - Flume Spout can be found from below link https://github.com/ExpediaInc/edw-Storm-Flume-Connectors
        Hide
        Ashish Paliwal added a comment -

        Anyone working on this? Would like to work on this.

        Show
        Ashish Paliwal added a comment - Anyone working on this? Would like to work on this.
        Hide
        Hari Shreedharan added a comment -

        Please feel free

        Show
        Hari Shreedharan added a comment - Please feel free
        Hide
        Ashish Paliwal added a comment -

        Started working on this. Technically we cannot call it "Storm Sink", as Sink's lifecycle is managed by Flume, whereas Storm Spout has it's own lifecycle managed by Storm.

        It would need to be build as some kind of connector between Flume Sink and Storm spout. If we have Kafka Sink, Kafka spout can be used. Or we use an embedded agent inside Spout, and drain the Channel inside nextTuple() method.

        Shall play around a bit more with Storm and try to explore other approaches as well. If anyone has got some ideas, would be great to discuss.

        Show
        Ashish Paliwal added a comment - Started working on this. Technically we cannot call it "Storm Sink", as Sink's lifecycle is managed by Flume, whereas Storm Spout has it's own lifecycle managed by Storm. It would need to be build as some kind of connector between Flume Sink and Storm spout. If we have Kafka Sink, Kafka spout can be used. Or we use an embedded agent inside Spout, and drain the Channel inside nextTuple() method. Shall play around a bit more with Storm and try to explore other approaches as well. If anyone has got some ideas, would be great to discuss.
        Hide
        Ashish Paliwal added a comment -

        Did a little more work. Embedded agent is out of scope, coz of Embedded source. Planned to use Application code, which gave an idea of implementing the Spout code similar to the way we write unit tests for Sources. It kind-of worked, shall work out a beta version of the connector in next couple of days.

        Show
        Ashish Paliwal added a comment - Did a little more work. Embedded agent is out of scope, coz of Embedded source. Planned to use Application code, which gave an idea of implementing the Spout code similar to the way we write unit tests for Sources. It kind-of worked, shall work out a beta version of the connector in next couple of days.
        Hide
        Ashish Paliwal added a comment -

        Hari ShreedharanMike Percy Any suggestions on which package this implementation shall go? It doesn't belong to Sinks for sure.

        Show
        Ashish Paliwal added a comment - Hari Shreedharan Mike Percy Any suggestions on which package this implementation shall go? It doesn't belong to Sinks for sure.
        Hide
        Mike Percy added a comment -

        Personally I'm not familiar with Storm and would not help maintain a Storm sink so I would not review this.

        Show
        Mike Percy added a comment - Personally I'm not familiar with Storm and would not help maintain a Storm sink so I would not review this.
        Hide
        Ravikumar Visweswara added a comment -

        Hello Guys,

        Below is my old POC code which does very similar to what you described. There is an example Topology also.

        https://github.com/rvisweswara/flume-storm-connector

        Repo also Contains Avro Sink Bolt to push data from Storm to flume.

        Limitations Of Spout:

        • Multiple instances of spout cant be started on same machine because of port issues. Can modify the code to add dynamic ports per instance, but its not ideal.
        • In real word cases, Flume Sinks expects a known host and port. But with storm, flume Avro source can run on any machine (Spout). To use this, one needs to write their own storm scheduler to run spout on a known IP address.
        • Spout can use File channel for persistence. If spout instance is moved to different machine, messages in the old channel will be left alone.

        Code worked fine for poc. But because of above limitations, I ended up writing a Kafka sink (which I will share shortly) for better reliability

        Show
        Ravikumar Visweswara added a comment - Hello Guys, Below is my old POC code which does very similar to what you described. There is an example Topology also. https://github.com/rvisweswara/flume-storm-connector Repo also Contains Avro Sink Bolt to push data from Storm to flume. Limitations Of Spout: Multiple instances of spout cant be started on same machine because of port issues. Can modify the code to add dynamic ports per instance, but its not ideal. In real word cases, Flume Sinks expects a known host and port. But with storm, flume Avro source can run on any machine (Spout). To use this, one needs to write their own storm scheduler to run spout on a known IP address. Spout can use File channel for persistence. If spout instance is moved to different machine, messages in the old channel will be left alone. Code worked fine for poc. But because of above limitations, I ended up writing a Kafka sink (which I will share shortly) for better reliability
        Hide
        Ashish Paliwal added a comment -

        Mike Percy It's not related to review request I am not able to fit the implementation logically into any of the modules. This is like a bridge between Flume and Storm. The implementation would have sort of an embedded agent which would listen to a Flume Sink and push messages into Storm.

        @Ravi - The implementation that I am working on is similar, but follows a slightly different path.

        As I am working on this, I feel using a Kafka Sink or other Sinks for which Spouts already exist, would be a cleaner solution.

        Suggestions/Comments?

        Show
        Ashish Paliwal added a comment - Mike Percy It's not related to review request I am not able to fit the implementation logically into any of the modules. This is like a bridge between Flume and Storm. The implementation would have sort of an embedded agent which would listen to a Flume Sink and push messages into Storm. @Ravi - The implementation that I am working on is similar, but follows a slightly different path. As I am working on this, I feel using a Kafka Sink or other Sinks for which Spouts already exist, would be a cleaner solution. Suggestions/Comments?
        Hide
        Gabriel Commeau added a comment -

        Hi,

        We wrote a Flume-to-storm connector, that we’ve been using in production systems at Comcast for over a year. We presented it as part of our talk about real-time stream processing at Hadoop World this year. See the slides of the presentation here (slide 13 most importantly): http://strataconf.com/stratany2013/public/schedule/detail/30915
        I’m still working on open-sourcing it, but I didn’t envision it to be part of neither Flume nor Storm, as it sits between the two. Indeed, I’m not sure we want to bring all Storm dependencies into Flume, or vice-versa for Storm. Also, the way we designed it, there is a Storm sink and a Flume spout, each component running within its own framework. So I started an independent GitHub project. I’ll keep you posted as I make progress.
        Using a Kafka sink would definitely work, especially because Storm already has a Kafka spout in storm-contrib. The downside obviously is that you introduce another set of servers (more maintenance, increased latency, ...).

        Show
        Gabriel Commeau added a comment - Hi, We wrote a Flume-to-storm connector, that we’ve been using in production systems at Comcast for over a year. We presented it as part of our talk about real-time stream processing at Hadoop World this year. See the slides of the presentation here (slide 13 most importantly): http://strataconf.com/stratany2013/public/schedule/detail/30915 I’m still working on open-sourcing it, but I didn’t envision it to be part of neither Flume nor Storm, as it sits between the two. Indeed, I’m not sure we want to bring all Storm dependencies into Flume, or vice-versa for Storm. Also, the way we designed it, there is a Storm sink and a Flume spout, each component running within its own framework. So I started an independent GitHub project. I’ll keep you posted as I make progress. Using a Kafka sink would definitely work, especially because Storm already has a Kafka spout in storm-contrib. The downside obviously is that you introduce another set of servers (more maintenance, increased latency, ...).
        Hide
        Ashish Paliwal added a comment -

        Good to know. IMHO, what you stated is true. Please keep us posted on future developments on your side.

        Meanwhile I shall continue my work on this and play around to see if I can get more ideas/approaches. If we can make an independent spout that listens on network for messages from a Flume sink, this could work. Lets try few ideas and see what Flumer's like most.

        Show
        Ashish Paliwal added a comment - Good to know. IMHO, what you stated is true. Please keep us posted on future developments on your side. Meanwhile I shall continue my work on this and play around to see if I can get more ideas/approaches. If we can make an independent spout that listens on network for messages from a Flume sink, this could work. Lets try few ideas and see what Flumer's like most.

          People

          • Assignee:
            Ashish Paliwal
            Reporter:
            Mubarak Seyed
          • Votes:
            11 Vote for this issue
            Watchers:
            18 Start watching this issue

            Dates

            • Created:
              Updated:

              Development