Samza
  1. Samza
  2. SAMZA-2

Fine-grain control over stream consumption

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.6.0
    • Fix Version/s: 0.7.0
    • Component/s: container
    • Labels:

      Description

      Currently, samza exposes configuration in the form of "streams.%s.consumer.max.bytes.per.sec" for throttling the # of bytes the Task will read from a stream. This is a feature request for programmatic fine-grain control over stream consumption. The use-case is a samza task that will be consuming multiple streams where some streams may be from live systems that have stricter SLA requirements and must always be prioritized over other streams that may be from batch systems. The above configuration is not the ideal way to express this type of stream prioritization because configuring the "batch" streams with a low consumption rate will decrease the overall throughput of the system when there is no data in the "live" streams. Furthermore, we'll want to throttle each "batch" stream based on external signals that can change over time. Because of the dynamic nature of these external signals, we would like to have a programmatic interface that can dynamically change the prioritization as the signal changes.

      Review board:

      https://reviews.apache.org/r/13725/

      1. SAMZA-2.0.patch
        23 kB
        Chris Riccomini
      2. SAMZA-2.1.patch
        28 kB
        Chris Riccomini
      3. SAMZA-2.2.patch
        99 kB
        Chris Riccomini
      4. SAMZA-2.3.patch
        92 kB
        Chris Riccomini
      5. SAMZA-2.4.patch
        97 kB
        Chris Riccomini
      6. SAMZA-2.5.patch
        120 kB
        Chris Riccomini
      7. DESIGN-SAMZA-2-0.pdf
        177 kB
        Chris Riccomini
      8. DESIGN-SAMZA-2-0.md
        22 kB
        Chris Riccomini

        Issue Links

          Activity

          Chris Riccomini created issue -
          Chris Riccomini made changes -
          Field Original Value New Value
          Assignee Chris Riccomini [ criccomini ]
          Chris Riccomini made changes -
          Link This issue depends upon SAMZA-3 [ SAMZA-3 ]
          Chris Riccomini made changes -
          Link This issue is related to SAMZA-12 [ SAMZA-12 ]
          Chris Riccomini made changes -
          Affects Version/s 0.6.0 [ 12324905 ]
          Chris Riccomini made changes -
          Fix Version/s 0.7.0 [ 12324906 ]
          Chris Riccomini made changes -
          Component/s container [ 12320913 ]
          Chris Riccomini made changes -
          Assignee Chris Riccomini [ criccomini ]
          Chris Riccomini made changes -
          Attachment SAMZA-2.0.patch [ 12599349 ]
          Chris Riccomini made changes -
          Attachment SAMZA-2.1.patch [ 12600427 ]
          Chris Riccomini made changes -
          Description Currently, samza exposes configuration in the form of "streams.%s.consumer.max.bytes.per.sec" for throttling the # of bytes the Task will read from a stream. This is a feature request for programmatic fine-grain control over stream consumption. The use-case is a samza task that will be consuming multiple streams where some streams may be from live systems that have stricter SLA requirements and must always be prioritized over other streams that may be from batch systems. The above configuration is not the ideal way to express this type of stream prioritization because configuring the "batch" streams with a low consumption rate will decrease the overall throughput of the system when there is no data in the "live" streams. Furthermore, we'll want to throttle each "batch" stream based on external signals that can change over time. Because of the dynamic nature of these external signals, we would like to have a programmatic interface that can dynamically change the prioritization as the signal changes.
          Currently, samza exposes configuration in the form of "streams.%s.consumer.max.bytes.per.sec" for throttling the # of bytes the Task will read from a stream. This is a feature request for programmatic fine-grain control over stream consumption. The use-case is a samza task that will be consuming multiple streams where some streams may be from live systems that have stricter SLA requirements and must always be prioritized over other streams that may be from batch systems. The above configuration is not the ideal way to express this type of stream prioritization because configuring the "batch" streams with a low consumption rate will decrease the overall throughput of the system when there is no data in the "live" streams. Furthermore, we'll want to throttle each "batch" stream based on external signals that can change over time. Because of the dynamic nature of these external signals, we would like to have a programmatic interface that can dynamically change the prioritization as the signal changes.

          Design proposal:

          https://wiki.apache.org/samza/Pluggable%20MessageChooser
          Chris Riccomini made changes -
          Description Currently, samza exposes configuration in the form of "streams.%s.consumer.max.bytes.per.sec" for throttling the # of bytes the Task will read from a stream. This is a feature request for programmatic fine-grain control over stream consumption. The use-case is a samza task that will be consuming multiple streams where some streams may be from live systems that have stricter SLA requirements and must always be prioritized over other streams that may be from batch systems. The above configuration is not the ideal way to express this type of stream prioritization because configuring the "batch" streams with a low consumption rate will decrease the overall throughput of the system when there is no data in the "live" streams. Furthermore, we'll want to throttle each "batch" stream based on external signals that can change over time. Because of the dynamic nature of these external signals, we would like to have a programmatic interface that can dynamically change the prioritization as the signal changes.

          Design proposal:

          https://wiki.apache.org/samza/Pluggable%20MessageChooser
          Currently, samza exposes configuration in the form of "streams.%s.consumer.max.bytes.per.sec" for throttling the # of bytes the Task will read from a stream. This is a feature request for programmatic fine-grain control over stream consumption. The use-case is a samza task that will be consuming multiple streams where some streams may be from live systems that have stricter SLA requirements and must always be prioritized over other streams that may be from batch systems. The above configuration is not the ideal way to express this type of stream prioritization because configuring the "batch" streams with a low consumption rate will decrease the overall throughput of the system when there is no data in the "live" streams. Furthermore, we'll want to throttle each "batch" stream based on external signals that can change over time. Because of the dynamic nature of these external signals, we would like to have a programmatic interface that can dynamically change the prioritization as the signal changes.

          Design proposal:

          https://wiki.apache.org/samza/Pluggable%20MessageChooser

          Review board:

          https://reviews.apache.org/r/13725/
          Chris Riccomini made changes -
          Attachment SAMZA-2.2.patch [ 12605174 ]
          Chris Riccomini made changes -
          Attachment SAMZA-2.3.patch [ 12606204 ]
          Chris Riccomini made changes -
          Attachment SAMZA-2.4.patch [ 12606233 ]
          Chris Riccomini made changes -
          Attachment SAMZA-2.5.patch [ 12606647 ]
          Chris Riccomini made changes -
          Status Open [ 1 ] Resolved [ 5 ]
          Resolution Fixed [ 1 ]
          Chris Riccomini made changes -
          Link This issue is related to SAMZA-67 [ SAMZA-67 ]
          Gavin made changes -
          Workflow classic default workflow [ 12809030 ] patch-available, re-open possible [ 12850506 ]
          Chris Riccomini made changes -
          Attachment DESIGN-SAMZA-2-0.pdf [ 12668108 ]
          Chris Riccomini made changes -
          Attachment DESIGN-SAMZA-2-0.md [ 12668109 ]
          Chris Riccomini made changes -
          Description Currently, samza exposes configuration in the form of "streams.%s.consumer.max.bytes.per.sec" for throttling the # of bytes the Task will read from a stream. This is a feature request for programmatic fine-grain control over stream consumption. The use-case is a samza task that will be consuming multiple streams where some streams may be from live systems that have stricter SLA requirements and must always be prioritized over other streams that may be from batch systems. The above configuration is not the ideal way to express this type of stream prioritization because configuring the "batch" streams with a low consumption rate will decrease the overall throughput of the system when there is no data in the "live" streams. Furthermore, we'll want to throttle each "batch" stream based on external signals that can change over time. Because of the dynamic nature of these external signals, we would like to have a programmatic interface that can dynamically change the prioritization as the signal changes.

          Design proposal:

          https://wiki.apache.org/samza/Pluggable%20MessageChooser

          Review board:

          https://reviews.apache.org/r/13725/
          Currently, samza exposes configuration in the form of "streams.%s.consumer.max.bytes.per.sec" for throttling the # of bytes the Task will read from a stream. This is a feature request for programmatic fine-grain control over stream consumption. The use-case is a samza task that will be consuming multiple streams where some streams may be from live systems that have stricter SLA requirements and must always be prioritized over other streams that may be from batch systems. The above configuration is not the ideal way to express this type of stream prioritization because configuring the "batch" streams with a low consumption rate will decrease the overall throughput of the system when there is no data in the "live" streams. Furthermore, we'll want to throttle each "batch" stream based on external signals that can change over time. Because of the dynamic nature of these external signals, we would like to have a programmatic interface that can dynamically change the prioritization as the signal changes.

          Review board:

          https://reviews.apache.org/r/13725/
          Chris Riccomini made changes -
          Labels design

            People

            • Assignee:
              Chris Riccomini
              Reporter:
              Chris Riccomini
            • Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development