Kafka
  1. Kafka
  2. KAFKA-264 Change the consumer side load balancing and distributed co-ordination to use a consumer co-ordinator
  3. KAFKA-265

Add a queue of zookeeper notifications in the zookeeper consumer to reduce the number of rebalancing attempts

    Details

    • Type: Sub-task Sub-task
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.7
    • Fix Version/s: 0.7.1
    • Component/s: core
    • Labels:
      None

      Description

      The correct fix for KAFKA-262 and other known issues with the current consumer rebalancing approach, is to get rid of the cache in the zookeeper consumer.
      The side-effect of that fix, though, is the large number of zookeeper notifications that will trigger a full rebalance operation on the consumer.

      Ideally, the zookeeper notifications can be batched and only one rebalance operation can be triggered for several such ZK notifications.

      1. kafka-265.patch
        3 kB
        Jun Rao
      2. kafka-265_v2.patch
        2 kB
        Jun Rao
      3. kafka-265_await_fix.patch
        0.6 kB
        Jun Rao
      4. kafka-265_shutdown.patch
        1 kB
        Jun Rao

        Activity

        Transition Time In Source Status Execution Times Last Executer Last Execution Date
        Open Open Patch Available Patch Available
        2d 4h 14m 1 Jun Rao 09/Feb/12 02:14
        Patch Available Patch Available Resolved Resolved
        4d 20h 17m 1 Jun Rao 13/Feb/12 22:31
        Hide
        Jun Rao added a comment -

        Committed the fix for both Condition and shutdown to trunk.

        Show
        Jun Rao added a comment - Committed the fix for both Condition and shutdown to trunk.
        Hide
        Neha Narkhede added a comment -

        +1. Doesn't the test in system_test/broker_failure catch this ?

        Show
        Neha Narkhede added a comment - +1. Doesn't the test in system_test/broker_failure catch this ?
        Jun Rao made changes -
        Attachment kafka-265_shutdown.patch [ 12514819 ]
        Hide
        Jun Rao added a comment -

        Found a new problem. The watch executor thread doesn't shutdown properly. Attached another patch.

        Show
        Jun Rao added a comment - Found a new problem. The watch executor thread doesn't shutdown properly. Attached another patch.
        Hide
        Neha Narkhede added a comment -

        Subtle, but critical
        +1

        Show
        Neha Narkhede added a comment - Subtle, but critical +1
        Jun Rao made changes -
        Attachment kafka-265_await_fix.patch [ 12514723 ]
        Hide
        Jun Rao added a comment -

        Found a bug. Condition should use await/notify, instead of wait/notify. Attach a patch.

        Show
        Jun Rao added a comment - Found a bug. Condition should use await/notify, instead of wait/notify. Attach a patch.
        Jun Rao made changes -
        Status Patch Available [ 10002 ] Resolved [ 5 ]
        Resolution Fixed [ 1 ]
        Hide
        Jun Rao added a comment -

        Thanks for the review. Just committed this.

        Show
        Jun Rao added a comment - Thanks for the review. Just committed this.
        Hide
        Neha Narkhede added a comment -

        +1 for v2. Thanks for accomodating the change

        Show
        Neha Narkhede added a comment - +1 for v2. Thanks for accomodating the change
        Jun Rao made changes -
        Attachment kafka-265_v2.patch [ 12514064 ]
        Hide
        Jun Rao added a comment -

        That's a good idea. Attach patch v2.

        Show
        Jun Rao added a comment - That's a good idea. Attach patch v2.
        Hide
        Neha Narkhede added a comment -

        If the queue is full, the ZKclient listener can hang temporarily. This is not ideal, since ZKClient will not be able to deliver more events until a rebalance operation is completed and the queue is cleared. In practice, this might not be a big issue, but can be easily avoided.

        I think there is an alternative solution to this problem, one that will

        1. avoid maintaining this queue
        2. reduce memory consumption in the consumer
        3. avoid adding another config option

        How about using just using a boolean variable that will indicate at least one rebalancing operation request ? The watcher thread can use a Condition to wait if the boolean variable is false. The ZK listener can merely set the boolean to true and signal the Condition, so that the watcher thread can proceed with a rebalancing operation.

        Show
        Neha Narkhede added a comment - If the queue is full, the ZKclient listener can hang temporarily. This is not ideal, since ZKClient will not be able to deliver more events until a rebalance operation is completed and the queue is cleared. In practice, this might not be a big issue, but can be easily avoided. I think there is an alternative solution to this problem, one that will 1. avoid maintaining this queue 2. reduce memory consumption in the consumer 3. avoid adding another config option How about using just using a boolean variable that will indicate at least one rebalancing operation request ? The watcher thread can use a Condition to wait if the boolean variable is false. The ZK listener can merely set the boolean to true and signal the Condition, so that the watcher thread can proceed with a rebalancing operation.
        Jun Rao made changes -
        Status Open [ 1 ] Patch Available [ 10002 ]
        Hide
        Jun Rao added a comment -

        patch attached.

        Show
        Jun Rao added a comment - patch attached.
        Jun Rao made changes -
        Field Original Value New Value
        Attachment kafka-265.patch [ 12513895 ]
        Neha Narkhede created issue -

          People

          • Assignee:
            Jun Rao
            Reporter:
            Neha Narkhede
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Time Tracking

              Estimated:
              Original Estimate - 168h
              168h
              Remaining:
              Remaining Estimate - 168h
              168h
              Logged:
              Time Spent - Not Specified
              Not Specified

                Development