Flume
  1. Flume
  2. FLUME-1507

Have "Topology Design Considerations" in User Guide

    Details

    • Type: Improvement Improvement
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: v1.3.0
    • Component/s: None
    • Labels:
      None

      Description

      It would be nice if the User Guide had a section which lists considerations for designing an end-to-end flume topology. I think a lot of people get lost in the long list of sources/sinks/etc and need a higher level overview of what to think about when designing a flow. Examples would be:

      • When to use Flume? Types of data that flume is good at handling (e.g. regularly generated, event-based, etc).
      • Reliability (explaining that flow reliability is a function of channels used, redundancy in the flow, and other factors)
      • Flume sizing (some basic ideas about how to size the nodes or network you are running on)

      The design space is too large here to give precise recommendations, but just orienting users that these are the main things they need to be thinking about would be really helpful. Some issues like reliability are much harder to explain for Flume-ng than OG, and I think we need documentation making this explicit. Down the road, a "cookbook" with specific examples would be even better.

      Thoughts?

      1. FLUME-1507.v2.patch.txt
        6 kB
        NO NAME
      2. FLUME-1507.v1.patch.txt
        6 kB
        NO NAME

        Activity

        Hide
        Jarek Jarcec Cecho added a comment -

        I would also recommend to mention some concrete layer designs and their benefits.

        At top of my head, I would mention that there might be just single layer - flume agents running exactly on source machines and sending data directly to target destination (HDFS). Advantage is that you do not need any extra machines, disadvantage is limited ability for "caching" data in case that target destination is down or creating a high number of files on HDFS. Another layering design would be to have two layers, where second layer would be very close to "collectors" in flume OG...

        Jarcec

        Show
        Jarek Jarcec Cecho added a comment - I would also recommend to mention some concrete layer designs and their benefits. At top of my head, I would mention that there might be just single layer - flume agents running exactly on source machines and sending data directly to target destination (HDFS). Advantage is that you do not need any extra machines, disadvantage is limited ability for "caching" data in case that target destination is down or creating a high number of files on HDFS. Another layering design would be to have two layers, where second layer would be very close to "collectors" in flume OG... Jarcec
        Hide
        NO NAME added a comment -

        Hey Jarcec - your point is well taken, I think looking at aggregation tiers and the pros/cons should also be in there.

        Show
        NO NAME added a comment - Hey Jarcec - your point is well taken, I think looking at aggregation tiers and the pros/cons should also be in there.
        Hide
        NO NAME added a comment -

        This patch updates the User Guide to have design considerations.

        Show
        NO NAME added a comment - This patch updates the User Guide to have design considerations.
        Hide
        NO NAME added a comment -

        This fixes spacing issues.

        Show
        NO NAME added a comment - This fixes spacing issues.
        Hide
        Jarek Jarcec Cecho added a comment -

        Committed in 01a81121b8a028c06201a022b50f537af7a8de44.

        Thank you for your contribution Patrick!

        Jarcec

        Show
        Jarek Jarcec Cecho added a comment - Committed in 01a81121b8a028c06201a022b50f537af7a8de44. Thank you for your contribution Patrick! Jarcec
        Hide
        Hudson added a comment -

        Integrated in flume-1.3.0 #7 (See https://builds.apache.org/job/flume-1.3.0/7/)
        FLUME-1507. Have "Topology Design Considerations" in User Guide. (Revision c8a93e4a588cc226bfbc803faead11eec2741364)

        Result = FAILURE
        jarcec : http://git-wip-us.apache.org/repos/asf/flume/repo?p=flume.git;a=summary&a=commit&h=c8a93e4a588cc226bfbc803faead11eec2741364
        Files :

        • flume-ng-doc/sphinx/FlumeUserGuide.rst
        Show
        Hudson added a comment - Integrated in flume-1.3.0 #7 (See https://builds.apache.org/job/flume-1.3.0/7/ ) FLUME-1507 . Have "Topology Design Considerations" in User Guide. (Revision c8a93e4a588cc226bfbc803faead11eec2741364) Result = FAILURE jarcec : http://git-wip-us.apache.org/repos/asf/flume/repo?p=flume.git;a=summary&a=commit&h=c8a93e4a588cc226bfbc803faead11eec2741364 Files : flume-ng-doc/sphinx/FlumeUserGuide.rst
        Hide
        Hudson added a comment -

        Integrated in flume-trunk #293 (See https://builds.apache.org/job/flume-trunk/293/)
        FLUME-1507. Have "Topology Design Considerations" in User Guide. (Revision 01a81121b8a028c06201a022b50f537af7a8de44)

        Result = UNSTABLE
        jarcec : http://git-wip-us.apache.org/repos/asf/flume/repo?p=flume.git;a=summary&a=commit&h=01a81121b8a028c06201a022b50f537af7a8de44
        Files :

        • flume-ng-doc/sphinx/FlumeUserGuide.rst
        Show
        Hudson added a comment - Integrated in flume-trunk #293 (See https://builds.apache.org/job/flume-trunk/293/ ) FLUME-1507 . Have "Topology Design Considerations" in User Guide. (Revision 01a81121b8a028c06201a022b50f537af7a8de44) Result = UNSTABLE jarcec : http://git-wip-us.apache.org/repos/asf/flume/repo?p=flume.git;a=summary&a=commit&h=01a81121b8a028c06201a022b50f537af7a8de44 Files : flume-ng-doc/sphinx/FlumeUserGuide.rst

          People

          • Assignee:
            NO NAME
            Reporter:
            NO NAME
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development