Kafka
  1. Kafka
  2. KAFKA-687

Rebalance algorithm should consider partitions from all topics

    Details

    • Type: Improvement Improvement
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: 0.9.0
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      The current rebalance step, as stated in the original Kafka paper [1], splits the partitions per topic between all the consumers. So if you have 100 topics with 2 partitions each and 10 consumers only two consumers will be used. That is, for each topic all partitions will be listed and shared between the consumers in the consumer group in order (not randomly).

      If the consumer group is reading from several topics at the same time it makes sense to split all the partitions from all topics between all the consumer. Following the example, we will have 200 partitions in total, 20 per consumer, using the 10 consumers.

      The load per topic could be different and the division should consider this. However even a random division should be better than the current algorithm while reading from several topics and should harm reading from a few topics with several partitions.

        Issue Links

          Activity

          Pablo Barrera created issue -
          Neha Narkhede made changes -
          Field Original Value New Value
          Affects Version/s 0.8.1 [ 12322960 ]
          Hide
          Jay Kreps added a comment -

          This is a very good point, and not one I had considered.

          It is probably not a trivial change because right now I think the election is done for each topic independently.

          We have in mind in the next major release after 0.8 (0.9, presumably) to move this co-ordination to the server, which would be a good time to fix this. We could either do this balancing exactly or else just randomize the start index (which would be almost as good if you had many topics.

          Show
          Jay Kreps added a comment - This is a very good point, and not one I had considered. It is probably not a trivial change because right now I think the election is done for each topic independently. We have in mind in the next major release after 0.8 (0.9, presumably) to move this co-ordination to the server, which would be a good time to fix this. We could either do this balancing exactly or else just randomize the start index (which would be almost as good if you had many topics.
          Neha Narkhede made changes -
          Affects Version/s 0.9 [ 12323928 ]
          Affects Version/s 0.8.1 [ 12322960 ]
          Joel Koshy made changes -
          Link This issue is duplicated by KAFKA-564 [ KAFKA-564 ]
          Hide
          Joel Koshy added a comment -

          Although we are working on the new consumer, some people were interested
          (offline) in getting some form of this done in the current consumer at least
          for wildcard consumption. It should (in theory) be simple to do, but may
          take a couple days because the wildcard consumption part of the code is
          convoluted - mainly because when it was done we did not want to modify the
          existing consumer too much.

          Anyway, I dumped some thoughts in the comments of this gist:
          https://gist.github.com/jjkoshy/5c3d065161153b7b1ee3 and the unit test at
          the end provides one possible partition layout strategy.

          Show
          Joel Koshy added a comment - Although we are working on the new consumer, some people were interested (offline) in getting some form of this done in the current consumer at least for wildcard consumption. It should (in theory) be simple to do, but may take a couple days because the wildcard consumption part of the code is convoluted - mainly because when it was done we did not want to modify the existing consumer too much. Anyway, I dumped some thoughts in the comments of this gist: https://gist.github.com/jjkoshy/5c3d065161153b7b1ee3 and the unit test at the end provides one possible partition layout strategy.
          Neha Narkhede made changes -
          Assignee Sriharsha Chintalapani [ sriharsha ]

            People

            • Assignee:
              Sriharsha Chintalapani
              Reporter:
              Pablo Barrera
            • Votes:
              2 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

              • Created:
                Updated:

                Development