Uploaded image for project: 'Kafka'
  1. Kafka
  2. KAFKA-14048 The Next Generation of the Consumer Rebalance Protocol
  3. KAFKA-16185

Fix client reconciliation of same assignment received in different epochs

    XMLWordPrintableJSON

Details

    Description

      Currently, the intention in the client state machine is that the client always reconciles whatever it has pending and sends an ack for it, but in cases where the same assignment is received in different epochs this does not work as expected.

      1 - Client might get stuck JOINING/RECONCILING, with a pending reconciliation (delayed), and it receives the same assignment, but in a new epoch (ex. after being FENCED). First time it receives the assignment it takes no action, as it already has it as pending to reconcile, but when the reconciliation completes it discards the result because the epoch changed. And this is wrong. Note that after sending the assignment with the new epoch one time, the broker continues to send null assignments.

      Here is a sample sequence leading to the client stuck JOINING:

      • client joins, epoch 0
      • client receives assignment tp1, stuck RECONCILING, epoch 1
      • member gets FENCED on the coord, coord bumps epoch to 2
      • client tries to rejoin (JOINING), epoch 0 provided by the client
      • new member added to the group (group epoch bumped to 3), client receives same assignment that is currently trying to reconcile (tp1), but with epoch 3
      • previous reconciliation completes, but will discard the result because it will notice that the memberHasRejoined (memberEpochOnReconciliationStart != memberEpoch). Client is stuck JOINING, with the server sending null target assignment because it hasn't changed since the last one sent (tp1)

      We should end up with a test similar to the existing #testDelayedReconciliationResultDiscardedIfMemberRejoins but with the case that the member receives the same assignment after being fenced and rejoining

      2 - Client is not sending ack back to the broker in cases where it finishes a reconciliation for the same assignment that it sent in the last HB (builder will not include the assignment). Following sequence:

      • client owns T1-1 (last HB sent included ack for T1-1)
      • client receives [T1-1, T2-1] and start reconciling
      • client receives T1-1 (meaning T2-1 needs to be revoked)
      • ongoing reconciliation for [T1-1, T2-1] fails so ack never sent for it
      • next reconciliation starts for T1-1 and completes, but ack is not sent because the builder sees it's the same it sent on the last HB, leaving the broker waiting for an ack that won't arrive.

      Attachments

        Issue Links

          Activity

            People

              lucasbru Lucas Brutschy
              lianetm Lianet Magrans
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: