Uploaded image for project: 'Kafka'
  1. Kafka
  2. KAFKA-2978

Topic partition is not sometimes consumed after rebalancing of consumer group

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: 0.9.0.0
    • Fix Version/s: 0.9.0.1
    • Component/s: consumer, core
    • Labels:
      None
    • Flags:
      Important

      Description

      Hi there, we are evaluating Kafka 0.9 to find if it is stable enough and ready for production. We wrote a tool that basically verifies that each produced message is also properly consumed. We found the issue described below while stressing Kafka using this tool.

      Adding more and more consumers to a consumer group may result in unsuccessful rebalancing. Data from one or more partitions are not consumed and are not effectively available to the client application (e.g. for 15 minutes). Situation can be resolved externally by touching the consumer group again (add or remove a consumer) which forces another rebalancing that may or may not be successful.

      Significantly higher CPU utilization was observed in such cases (from about 3% to 17%). The CPU utilization takes place in both the affected consumer and Kafka broker according to htop and profiling using jvisualvm.

      Jvisualvm indicates the issue may be related to KAFKA-2936 (see its screenshots in the GitHub repo below), but I'm very unsure. I don't also know if the issue is in consumer or broker because both are affected and I don't know Kafka internals.

      The issue is not deterministic but it can be easily reproduced after a few minutes just by executing more and more consumers. More parallelism with multiple CPUs probably gives the issue more chances to appear.

      The tool itself together with very detailed instructions for quite reliable reproduction of the issue and initial analysis are available here:

      My colleague was able to independently reproduce the issue according to the instructions above. If you have any questions or if you need any help with the tool, just let us know. This issue is blocker for us.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                hachikuji Jason Gustafson
                Reporter:
                turek@avast.com Michal Turek
                Reviewer:
                Guozhang Wang
              • Votes:
                0 Vote for this issue
                Watchers:
                5 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: