Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-22839

Assign-evenly kafkaTopicPartitions of multiple topics to flinkKafkaConsumer subtask

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Duplicate
    • 1.11.0
    • None
    • Connectors / Kafka
    • None

    Description

      Now ,with flink1.11 kafka connecotr,when we consume multiple kafka topics by one flinkkafkaconsumer , when we set the consumer parallelism equals with the total partitions count of multiple topic , make a decision each topic partition consume by one kafka consumer, so each topic partition count is less than the subtask count. But,the problem is that currently, some subtask is total free while someothers workload is very high, this problem is caused by that the partitionAssigner assign partion of earch topic indepently currently.

      Following is one example: Target topics: topi1, topic2 ,topic3 ,topic4.  each has 3 partitions. In our job we consume the 4 topic by one consumer , our flink standalone cluster got 9 taskworkers on different nodes. we want balance the workload as much as possible, so we  set the paralelism of flinkkafkaconsumer to 12. from the UI we notice that the 0-5 subtask is free without partition assigned, the total 12 partiton is assigned to 6-11 subtask. We learned the source code of KafkaTopicPartitionAssigner to explain this phenomenon ,and then we extend one more partition assign strategy which can deal with the need we describe up, this stategy can evenly assign partiton from multiple topic grobally to subtask of consumer. we want to contibute to flink, so someone has the same requirement can use it directlly.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              xiaolongsy2008 Xu xiaolong
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - 12h
                  12h
                  Remaining:
                  Remaining Estimate - 12h
                  12h
                  Logged:
                  Time Spent - Not Specified
                  Not Specified