Uploaded image for project: 'Kafka'
  1. Kafka
  2. KAFKA-2331

Kafka does not spread partitions in a topic among all consumers evenly

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Auto Closed
    • 0.8.1.1
    • None
    • clients, consumer
    • None

    Description

      I want to have 1 topic with 10 partitions. I am using default configuration of Kafka. I create 1 topic with 10 partitions by that helper script and now I am about to produce messages to it.

      The thing is that even all partitions are indeed consumed, some consumers have more then 1 partition assigned even I have number of consumer threads equal to partitions in a topic hence some threads are idle.

      Let's describe it in more detail.

      I know that common stuff that you need one consumer thread per partition. I want to be able to commit offsets per partition and this is possible only when I have 1 thread per consumer connector per partition (I am using high level consumer).

      So I create 10 threads, in each thread I am calling Consumer.createJavaConsumerConnector() where I am doing this

      topicCountMap.put("mytopic", 1);
      and in the end I have 1 iterator which consumes messages from 1 partition.

      When I do this 10 times, I have 10 consumers, consumer per thread per partition where I can commit offsets independently per partition because if I put different number from 1 in topic map, I would end up with more then 1 consumer thread for that topic for given consumer instance so if I am about to commit offsets with created consumer instance, it would commit them for all threads which is not desired.

      But the thing is that when I use consumers, only 7 consumers are involved and it seems that other consumer threads are idle but I do not know why.

      The thing is that I am creating these consumer threads in a loop. So I start first thread (submit to executor service), then another, then another and so on.

      So the scenario is that first consumer gets all 10 partitions, then 2nd connects so it is splits between these two to 5 and 5 (or something similar), then other threads are connecting.

      I understand this as a partition rebalancing among all consumers so it behaves well in such sense that if more consumers are being created, partition rebalancing occurs between these consumers so every consumer should have some partitions to operate upon.

      But from the results I see that there is only 7 consumers and according to consumed messages it seems they are split like 3,2,1,1,1,1,1 partition-wise. Yes, these 7 consumers covered all 10 partitions, but why consumers with more then 1 partition do no split and give partitions to remaining 3 consumers?

      I am pretty much wondering what is happening with remaining 3 threads and why they do not "grab" partitions from consumers which have more then 1 partition assigned.

      Attachments

        Activity

          People

            Unassigned Unassigned
            smiklosovic Stefan Miklosovic
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: