XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 3.7.0
    • 3.8.0
    • None
    • None

    Description

      While running test 0113 of librdkafka, subtest _u_multiple_subscription_changes_ have received this error saying that a partition is assigned to two members at the same time.

      Error: C_6#consumer-9 is assigned rdkafkatest_rnd550f20623daba04c_0113u_2 [0] which is already assigned to consumer C_5#consumer-8 

      I've reconstructed this sequence:

      C_5 SUBSCRIBES TO T1

      %7|1705403451.561|HEARTBEAT|C_5#consumer-8| [thrd:main]: GroupCoordinator/1: Heartbeat of member id "RaTCu6RXQH-FiSl95iZzdw", group id "rdkafkatest_rnd53b4eb0c2de343_0113u", generation id 6, group instance id "(null)", current assignment "", subscribe topics "rdkafkatest_rnd5a91902462d61c2e_0113u_1((null))[-1]"

      C_5 ASSIGNMENT CHANGES TO T1-P7, T1-P8, T1-P12

      [2024-01-16 12:10:51,562] INFO [GroupCoordinator id=1 topic=__consumer_offsets partition=7] [GroupId rdkafkatest_rnd53b4eb0c2de343_0113u] Member RaTCu6RXQH-FiSl95iZzdw transitioned from CurrentAssignment(memberEpoch=6, previousMemberEpoch=0, targetMemberEpoch=6, state=assigning, assignedPartitions={}, partitionsPendingRevocation={}, partitionsPendingAssignment={IKXGrFR1Rv-Qes7Ummas6A=[3, 12]}) to CurrentAssignment(memberEpoch=14, previousMemberEpoch=6, targetMemberEpoch=14, state=stable, assignedPartitions={IKXGrFR1Rv-Qes7Ummas6A=[7, 8, 12]}, partitionsPendingRevocation={}, partitionsPendingAssignment={}). (org.apache.kafka.coordinator.group.GroupMetadataManager)

       

      C_5 RECEIVES TARGET ASSIGNMENT

      %7|1705403451.565|HEARTBEAT|C_5#consumer-8| [thrd:main]: GroupCoordinator/1: Heartbeat response received target assignment "(null)(IKXGrFR1Rv+Qes7Ummas6A)[7], (null)(IKXGrFR1Rv+Qes7Ummas6A)[8], (null)(IKXGrFR1Rv+Qes7Ummas6A)[12]"

       

      C_5 ACKS TARGET ASSIGNMENT

      %7|1705403451.566|HEARTBEAT|C_5#consumer-8| [thrd:main]: GroupCoordinator/1: Heartbeat of member id "RaTCu6RXQH-FiSl95iZzdw", group id "rdkafkatest_rnd53b4eb0c2de343_0113u", generation id 14, group instance id "NULL", current assignment "rdkafkatest_rnd5a91902462d61c2e_0113u_1(IKXGrFR1Rv+Qes7Ummas6A)[7], rdkafkatest_rnd5a91902462d61c2e_0113u_1(IKXGrFR1Rv+Qes7Ummas6A)[8], rdkafkatest_rnd5a91902462d61c2e_0113u_1(IKXGrFR1Rv+Qes7Ummas6A)[12]", subscribe topics "rdkafkatest_rnd5a91902462d61c2e_0113u_1((null))[-1]"
      %7|1705403451.567|HEARTBEAT|C_5#consumer-8| [thrd:main]: GroupCoordinator/1: Heartbeat response received target assignment "(null)(IKXGrFR1Rv+Qes7Ummas6A)[7], (null)(IKXGrFR1Rv+Qes7Ummas6A)[8], (null)(IKXGrFR1Rv+Qes7Ummas6A)[12]"

       

      C_5 SUBSCRIBES TO T1,T2: T1 partitions are revoked, 5 T2 partitions are pending 

      %7|1705403452.612|HEARTBEAT|C_5#consumer-8| [thrd:main]: GroupCoordinator/1: Heartbeat of member id "RaTCu6RXQH-FiSl95iZzdw", group id "rdkafkatest_rnd53b4eb0c2de343_0113u", generation id 14, group instance id "NULL", current assignment "NULL", subscribe topics "rdkafkatest_rnd550f20623daba04c_0113u_2((null))[-1], rdkafkatest_rnd5a91902462d61c2e_0113u_1((null))[-1]"
      [2024-01-16 12:10:52,615] INFO [GroupCoordinator id=1 topic=__consumer_offsets partition=7] [GroupId rdkafkatest_rnd53b4eb0c2de343_0113u] Member RaTCu6RXQH-FiSl95iZzdw updated its subscribed topics to: [rdkafkatest_rnd550f20623daba04c_0113u_2, rdkafkatest_rnd5a91902462d61c2e_0113u_1]. (org.apache.kafka.coordinator.group.GroupMetadataManager)
      [2024-01-16 12:10:52,616] INFO [GroupCoordinator id=1 topic=__consumer_offsets partition=7] [GroupId rdkafkatest_rnd53b4eb0c2de343_0113u] Member RaTCu6RXQH-FiSl95iZzdw transitioned from CurrentAssignment(memberEpoch=14, previousMemberEpoch=6, targetMemberEpoch=14, state=stable, assignedPartitions={IKXGrFR1Rv-Qes7Ummas6A=[7, 8, 12]}, partitionsPendingRevocation={}, partitionsPendingAssignment={}) to CurrentAssignment(memberEpoch=14, previousMemberEpoch=6, targetMemberEpoch=16, state=revoking, assignedPartitions={}, partitionsPendingRevocation={IKXGrFR1Rv-Qes7Ummas6A=[7, 8, 12]}, partitionsPendingAssignment={EnZMikZURKiUoxZf0rozaA=[0, 1, 2, 8, 9]}). (org.apache.kafka.coordinator.group.GroupMetadataManager)
      %7|1705403452.618|HEARTBEAT|C_5#consumer-8| [thrd:main]: GroupCoordinator/1: Heartbeat response received target assignment ""

      C_5 FINISHES REVOCATION

      %7|1705403452.618|CGRPJOINSTATE|C_5#consumer-8| [thrd:main]: Group "rdkafkatest_rnd53b4eb0c2de343_0113u" changed join state wait-assign-call -> steady (state up)

      C_5 ACKS REVOCATION, RECEIVES T2-P0,T2-P1,T2-P2

       

      %7|1705403452.618|HEARTBEAT|C_5#consumer-8| [thrd:main]: GroupCoordinator/1: Heartbeat of member id "RaTCu6RXQH-FiSl95iZzdw", group id "rdkafkatest_rnd53b4eb0c2de343_0113u", generation id 14, group instance id "NULL", current assignment "", subscribe topics "NULL"
      [2024-01-16 12:10:52,618] INFO [GroupCoordinator id=1 topic=__consumer_offsets partition=7] [GroupId rdkafkatest_rnd53b4eb0c2de343_0113u] Member RaTCu6RXQH-FiSl95iZzdw transitioned from CurrentAssignment(memberEpoch=14, previousMemberEpoch=6, targetMemberEpoch=16, state=revoking, assignedPartitions={}, partitionsPendingRevocation={IKXGrFR1Rv-Qes7Ummas6A=[7, 8, 12]}, partitionsPendingAssignment={EnZMikZURKiUoxZf0rozaA=[0, 1, 2, 8, 9]}) to CurrentAssignment(memberEpoch=16, previousMemberEpoch=14, targetMemberEpoch=16, state=assigning, assignedPartitions={EnZMikZURKiUoxZf0rozaA=[0, 1, 2]}, partitionsPendingRevocation={}, partitionsPendingAssignment={EnZMikZURKiUoxZf0rozaA=[8, 9]}). (org.apache.kafka.coordinator.group.GroupMetadataManager)
      %7|1705403452.620|HEARTBEAT|C_5#consumer-8| [thrd:main]: GroupCoordinator/1: Heartbeat response received target assignment "(null)(EnZMikZURKiUoxZf0rozaA)[0], (null)(EnZMikZURKiUoxZf0rozaA)[1], (null)(EnZMikZURKiUoxZf0rozaA)[2]"
      
      %7|1705403452.621|HEARTBEAT|C_5#consumer-8| [thrd:main]: GroupCoordinator/1: Heartbeat of member id "RaTCu6RXQH-FiSl95iZzdw", group id "rdkafkatest_rnd53b4eb0c2de343_0113u", generation id 16, group instance id "NULL", current assignment "rdkafkatest_rnd550f20623daba04c_0113u_2(EnZMikZURKiUoxZf0rozaA)[0], rdkafkatest_rnd550f20623daba04c_0113u_2(EnZMikZURKiUoxZf0rozaA)[1], rdkafkatest_rnd550f20623daba04c_0113u_2(EnZMikZURKiUoxZf0rozaA)[2]", subscribe topics "NULL"
      %7|1705403452.622|HEARTBEAT|C_5#consumer-8| [thrd:main]: GroupCoordinator/1: Heartbeat response received target assignment "(null)(EnZMikZURKiUoxZf0rozaA)[0], (null)(EnZMikZURKiUoxZf0rozaA)[1], (null)(EnZMikZURKiUoxZf0rozaA)[2]"
      

      LATER C_6 SUBSCRIBES TO T2,T1, IT RECEIVES T1-P9,T1-P13,T1-P15 AND T2-P0. T2-P0 is assigned without being revoked from C_5.

      %7|1705403453.612|HEARTBEAT|C_6#consumer-9| [thrd:main]: GroupCoordinator/1: Heartbeat of member id "ss189cxiQ4q_cyNB1aNrfA", group id "rdkafkatest_rnd53b4eb0c2de343_0113u", generation id 14, group instance id "NULL", current assignment "NULL", subscribe topics "rdkafkatest_rnd550f20623daba04c_0113u_2((null))[-1], rdkafkatest_rnd5a91902462d61c2e_0113u_1((null))[-1]"
      [2024-01-16 12:10:53,612] INFO [GroupCoordinator id=1 topic=__consumer_offsets partition=7] [GroupId rdkafkatest_rnd53b4eb0c2de343_0113u] Member ss189cxiQ4q_cyNB1aNrfA updated its subscribed topics to: [rdkafkatest_rnd550f20623daba04c_0113u_2, rdkafkatest_rnd5a91902462d61c2e_0113u_1]. (org.apache.kafka.coordinator.group.GroupMetadataManager)
      [2024-01-16 12:10:53,613] INFO [GroupCoordinator id=1 topic=__consumer_offsets partition=7] [GroupId rdkafkatest_rnd53b4eb0c2de343_0113u] Bumped group epoch to 17. (org.apache.kafka.coordinator.group.GroupMetadataManager)
      [2024-01-16 12:10:53,614] INFO [GroupCoordinator id=1 topic=__consumer_offsets partition=7] [GroupId rdkafkatest_rnd53b4eb0c2de343_0113u] Computed a new target assignment for epoch 17 with 'uniform' assignor: {SWlBlUvBQpqdl93hD6SJOA=Assignment(error=0, partitions={IKXGrFR1Rv-Qes7Ummas6A=[2, 4, 5]}, metadata=VersionedMetadata(version=0, metadata=java.nio.HeapByteBuffer[pos=0 lim=0 cap=0])), RaTCu6RXQH-FiSl95iZzdw=Assignment(error=0, partitions={EnZMikZURKiUoxZf0rozaA=[1, 2, 8, 9, 10]}, metadata=VersionedMetadata(version=0, metadata=java.nio.HeapByteBuffer[pos=0 lim=0 cap=0])), mhzGxxF2R-Oa3BQGotzx7A=Assignment(error=0, partitions={EnZMikZURKiUoxZf0rozaA=[3, 4, 5, 6, 7]}, metadata=VersionedMetadata(version=0, metadata=java.nio.HeapByteBuffer[pos=0 lim=0 cap=0])), i7qqdVw4S0KZR5J5Z0XvoQ=Assignment(error=0, partitions={IKXGrFR1Rv-Qes7Ummas6A=[1, 3, 7]}, metadata=VersionedMetadata(version=0, metadata=java.nio.HeapByteBuffer[pos=0 lim=0 cap=0])), d6cbfWsfTt-7XdYU23MNaw=Assignment(error=0, partitions={IKXGrFR1Rv-Qes7Ummas6A=[10, 11, 14]}, metadata=VersionedMetadata(version=0, metadata=java.nio.HeapByteBuffer[pos=0 lim=0 cap=0])), ss189cxiQ4q_cyNB1aNrfA=Assignment(error=0, partitions={IKXGrFR1Rv-Qes7Ummas6A=[8, 13, 15], EnZMikZURKiUoxZf0rozaA=[0]}, metadata=VersionedMetadata(version=0, metadata=java.nio.HeapByteBuffer[pos=0 lim=0 cap=0])), 2vIQ0F6NRlOd9RgnVKtuDg=Assignment(error=0, partitions={IKXGrFR1Rv-Qes7Ummas6A=[0, 6, 9, 12]}, metadata=VersionedMetadata(version=0, metadata=java.nio.HeapByteBuffer[pos=0 lim=0 cap=0])), pOvUx45IT-CJDvBQlfvVbQ=Assignment(error=0, partitions={EnZMikZURKiUoxZf0rozaA=[11, 12, 13, 14, 15]}, metadata=VersionedMetadata(version=0, metadata=java.nio.HeapByteBuffer[pos=0 lim=0 cap=0]))}. (org.apache.kafka.coordinator.group.GroupMetadataManager)
      [2024-01-16 12:10:53,614] INFO [GroupCoordinator id=1 topic=__consumer_offsets partition=7] [GroupId rdkafkatest_rnd53b4eb0c2de343_0113u] Member ss189cxiQ4q_cyNB1aNrfA transitioned from CurrentAssignment(memberEpoch=14, previousMemberEpoch=9, targetMemberEpoch=14, state=stable, assignedPartitions={IKXGrFR1Rv-Qes7Ummas6A=[13, 15]}, partitionsPendingRevocation={}, partitionsPendingAssignment={}) to CurrentAssignment(memberEpoch=17, previousMemberEpoch=14, targetMemberEpoch=17, state=stable, assignedPartitions={IKXGrFR1Rv-Qes7Ummas6A=[8, 13, 15], EnZMikZURKiUoxZf0rozaA=[0]}, partitionsPendingRevocation={}, partitionsPendingAssignment={}). (org.apache.kafka.coordinator.group.GroupMetadataManager)
      [0m%7|1705403453.616|HEARTBEAT|C_6#consumer-9| [thrd:main]: GroupCoordinator/1: Heartbeat response received target assignment "(null)(IKXGrFR1Rv+Qes7Ummas6A)[8], (null)(IKXGrFR1Rv+Qes7Ummas6A)[13], (null)(IKXGrFR1Rv+Qes7Ummas6A)[15], (null)(EnZMikZURKiUoxZf0rozaA)[0]"

      I also see in consecutive broker logs that the same partition T2-P0, but also T2-P1 and T2-p2, is assigned to C_5 (RaTCu6RXQH-FiSl95iZzdw) and pending revocation from a third one (member id mhzGxxF2R-Oa3BQGotzx7A)

      [2024-01-16 12:10:52,618] INFO [GroupCoordinator id=1 topic=__consumer_offsets partition=7] [GroupId rdkafkatest_rnd53b4eb0c2de343_0113u] Member RaTCu6RXQH-FiSl95iZzdw transitioned from CurrentAssignment(memberEpoch=14, previousMemberEpoch=6, targetMemberEpoch=16, state=revoking, assignedPartitions={}, partitionsPendingRevocation={IKXGrFR1Rv-Qes7Ummas6A=[7, 8, 12]}, partitionsPendingAssignment={EnZMikZURKiUoxZf0rozaA=[0, 1, 2, 8, 9]}) to CurrentAssignment(memberEpoch=16, previousMemberEpoch=14, targetMemberEpoch=16, state=assigning, assignedPartitions={EnZMikZURKiUoxZf0rozaA=[0, 1, 2]}, partitionsPendingRevocation={}, partitionsPendingAssignment={EnZMikZURKiUoxZf0rozaA=[8, 9]}). (org.apache.kafka.coordinator.group.GroupMetadataManager)
      [2024-01-16 12:10:52,620] INFO [GroupCoordinator id=1 topic=__consumer_offsets partition=7] [GroupId rdkafkatest_rnd53b4eb0c2de343_0113u] Member mhzGxxF2R-Oa3BQGotzx7A transitioned from CurrentAssignment(memberEpoch=15, previousMemberEpoch=11, targetMemberEpoch=15, state=stable, assignedPartitions={EnZMikZURKiUoxZf0rozaA=[0, 1, 2, 3, 4, 5, 6, 7]}, partitionsPendingRevocation={}, partitionsPendingAssignment={}) to CurrentAssignment(memberEpoch=15, previousMemberEpoch=11, targetMemberEpoch=16, state=revoking, assignedPartitions={EnZMikZURKiUoxZf0rozaA=[3, 4, 5, 6, 7]}, partitionsPendingRevocation={EnZMikZURKiUoxZf0rozaA=[0, 1, 2]}, partitionsPendingAssignment={}). (org.apache.kafka.coordinator.group.GroupMetadataManager)

      NOTE: error handling in my current branch is disabled because it needs to be integrated, there is an assert that fails the tests if any error happens, so effects of retriable errors can be excluded.
      The coordinator was broker1.
      I attach the broker properties too.

      Attachments

        1. broker1.log
          814 kB
          Emanuele Sabellico
        2. broker2.log
          1.41 MB
          Emanuele Sabellico
        3. broker3.log
          1.33 MB
          Emanuele Sabellico
        4. librdkafka.log
          636 kB
          Emanuele Sabellico
        5. server.properties
          7 kB
          Emanuele Sabellico
        6. server1.properties
          7 kB
          Emanuele Sabellico
        7. server2.properties
          7 kB
          Emanuele Sabellico

        Issue Links

          Activity

            People

              dajac David Jacot
              emasab Emanuele Sabellico
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: