Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-7143

Partition assignment for Kafka consumer is not stable

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Blocker
    • Resolution: Fixed
    • 1.3.1
    • 1.3.2, 1.4.0
    • Connectors / Kafka
    • None

    Description

      Important Notice:

      Upgrading jobs from 1.2.x exhibits no known problems. Jobs from 1.3.0 and 1.3.1 with incorrect partition assignments cannot be automatically fixed by upgrading to Flink 1.3.2 via a savepoint, because the upgraded version would resume the wrong partition assignment from the savepoint. A workaround is to assign a different uuid to the Kafka source (so the offsets won't be resumed from the savepoint) and let it start from the latest offsets committed to Kafka instead. Note that this may violate exactly-once semantics and introduce some duplicates, because Kafka's committed offsets are not guaranteed to be 100% up date date with Flink's internal offset tracking. To maximize the alignment between the offsets in Kafka and those tracked by Flink, we suggest to abort the 1.3.x job via the "cancel with savepoint" command (https://ci.apache.org/projects/flink/flink-docs-release-1.3/setup/savepoints.html#cancel-job-with-savepoint) during the upgrade process.

      Original Issue Description

      While deploying Flink 1.3 release to hundreds of routing jobs, we found some issues with partition assignment for Kafka consumer. some partitions weren't assigned and some partitions got assigned more than once.

      Here is the bug introduced in Flink 1.3.

      	protected static void initializeSubscribedPartitionsToStartOffsets(...) {
                      ...
      		for (int i = 0; i < kafkaTopicPartitions.size(); i++) {
      			if (i % numParallelSubtasks == indexOfThisSubtask) {
      				if (startupMode != StartupMode.SPECIFIC_OFFSETS) {
      					subscribedPartitionsToStartOffsets.put(kafkaTopicPartitions.get(i), startupMode.getStateSentinel());
      				}
                      ...
               }
      

      The bug is using array index i to mod against numParallelSubtasks. if the kafkaTopicPartitions has different order among different subtasks, assignment is not stable cross subtasks and creates the assignment issue mentioned earlier.

      fix is also very simple, we should use partitionId to do the mod if (kafkaTopicPartitions.get(i).getPartition() % numParallelSubtasks == indexOfThisSubtask). That would result in stable assignment cross subtasks that is independent of ordering in the array.

      marking it as blocker because of its impact.

      Attachments

        Activity

          People

            tzulitai Tzu-Li (Gordon) Tai
            stevenz3wu Steven Zhen Wu
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: