Uploaded image for project: 'Kafka'
  1. Kafka
  2. KAFKA-5600

Group loading regression causing stale metadata/offsets cache

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: 0.10.2.1, 0.11.0.0
    • Fix Version/s: 0.11.0.1, 1.0.0
    • Component/s: core
    • Environment:
      any

      Description

      After long investigation we found a Problem in Kafka.
      When a __consumer_offsets partition gets segmented and Kafka is restarted and needs to reload offsets, consumers will start at a wrong position when metadata and offset events are in both segments.

      Reproduction:
      1.) Start zookeeper and kafka as is from the archive

      KAFKA_HEAP_OPTS="-Xmx256M -Xms256M" bin/zookeeper-server-start.sh config/zookeeper.properties
      KAFKA_HEAP_OPTS="-Xmx256M -Xms256M" bin/kafka-server-start.sh config/server.properties
      

      2.) Start KafkaErrorProducer.java which adds 1M log entries to the topic test
      3.) Start KafkaErrorConsumer.java which starts a consumer, reads 100 entries one by one and then closes the consumer. This leads to a 2nd segment in /tmp/kafka-logs/__consumer_offsets-27. This step takes some time (around 5mins). The close of the consumer is needed to have metadata events in the segments too.
      4.) Stop and restart the Kafka broker
      5.) Start any consumer on topic test and group testgroup

      bin/kafka-console-consumer.sh --from-beginning --bootstrap-server localhost:9092 --topic test --consumer-property group.id=testgroup
      

      Is:
      the consumer starts at the segmentation boundary
      Expected:
      the consumer starts at the end

      The Reason for this behavior is the closing brace of the while loop in GroupMetadataManager#loadGroupsAndOffsets at a wrong position introduced with commit https://github.com/apache/kafka/commit/5bd06f1d542e6b588a1d402d059bc24690017d32
      I will prepare a pull request.

      Edit: The issue can happen if there are multiple reads from the same segment, see https://github.com/apache/kafka/pull/3538#discussion_r127759694

        Attachments

        1. KafkaErrorProducer.java
          0.8 kB
          Jan Burkhardt
        2. KafkaErrorConsumer.java
          1 kB
          Jan Burkhardt

          Issue Links

            Activity

              People

              • Assignee:
                bjrke Jan Burkhardt
                Reporter:
                bjrke Jan Burkhardt
              • Votes:
                0 Vote for this issue
                Watchers:
                8 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: