Uploaded image for project: 'Kafka'
  1. Kafka
  2. KAFKA-4446

If consumer offset topic created with less replicas than min.insync.replicas, consuming is not possible

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Duplicate
    • 0.10.1.0
    • None
    • core
    • None
    • Ubuntu 16.04

    Description

      This is a bit of an edge case but it has a high impact. I have seen an issue multiple times while creating a new cluster of Kafka brokers and consuming components in an automated deployment. Full details of the chain of events are given below. I expect this could also occur if the first consume to a Kafka cluster happens while some nodes are in a failure state.

      It appears that while the consumer offsets topic can be created with a replication factor of only 1 or 2 (if only 1 Kafka broker is alive when it's created, for example), the min.insync.replicas is still applied and if that's higher than the replication factor it becomes impossible to consume any messages. It seems that when a topic is created explicitly with a replication factor less than min.insync.replicas, that rule should not be applied as it makes the topic unusable. From my experience this seems to be the case for topics I've created myself, but the consumer offsets topic appears to behave differently.

      Detailed scenario:

      • Kafka is utilised as an event messaging pipeline around which a number of components are deployed that produce and consume messages.
      • Deployments of a new environment bring up all components, including a 3 node Kafka cluster and some event-driven components at the same time.
      • Our configuration sets min.insync.replicas=2.
      • Kafka node 1 opens its listener port before the other two brokers come up
      • one of the components subscribes to a topic and attempts to consume from a pre-created topic for the first time, also before the other two Kafka brokers come up
      • Kafka node 1 creates the consumer offsets topic with replication factor 1, as it is the only live broker. This is expected behaviour as per the documentation for offsets.topic.replication.factor.
      • Kafka node 1 fails with a repeating error message and never recovers when attempting to send a consumer offset message to the topic as there is only 1 member of the ISR but min.insync.replicas is 2. The repeating error message is:
        kafka2_1 | org.apache.kafka.common.errors.NotEnoughReplicasException: Number of insync replicas for partition [__consumer_offsets,31] is [1], below required minimum [2]
      • No consumers can consume from this cluster any more.

      (FYI 0.10.1.0 is still listed as unreleased in JIRA, but the project front page says it's the latest release)

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              oli.deakin Oliver Deakin
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: