Uploaded image for project: 'Kafka'
  1. Kafka
  2. KAFKA-16954

Move consumer leave operations on close to background thread

    XMLWordPrintableJSON

Details

    Description

      When a consumer unsubscribes, the app thread simply triggers an Unsubscribe event that will take care of it all in the background thread: release assignment (callbacks), clear assigned partitions, and send leave group HB.

      On the contrary, when a consumer is closed, these actions happen in both threads:

      • release assignment -> in the app thread by directly running the callbacks
      • clear assignment -> in app thread by updating the subscriptionState
      • send leave group HB -> in the background thread via an event LeaveOnClose 

      This situation could lead to race conditions, mainly because of the close updating the subscription state in the app thread, when other operations in the background could be already running based on it. Ex. 

      • unsubscribe in app thread (triggers background UnsubscribeEvent to revoke and leave)
      • unsubscribe fails (ex. interrupted, leaving operation running in the background thread to revoke partitions and leave)
      • consumer close (will revoke and clear assignment in the app thread)
      •  UnsubscribeEvent in the background may fail by trying to revoke partitions that it does not own anymore - No current assignment for partition ...

      A basic check has been added to the background thread revocation to avoid the race condition, ensuring that we only revoke partitions we own, but still we should avoid the root cause, which is updating the assignment on the app thread. We should consider having the close operation as a single LeaveOnClose event handled in the background. That even already takes cares of revoking the partitions and clearing assignment on the background, so no need to take care of it in the app thread. We should only ensure that we processBackgroundEvents until the LeaveOnClose completes (to allow for callbacks to run in the app thread)

       

      Trying to understand the current approach, I imagine the initial motivation to have the callabacks (and assignment cleared) in the app thread was to avoid the back-and-forth: app thread close -> background thread leave event -> app thread to run callback -> background thread to clear assignment and send HB. But updating the assignment on the app thread ends up being problematic, as it mainly happens in the background so it opens up the door for race conditions on the subscription state. 

      Attachments

        Issue Links

          Activity

            People

              lianetm Lianet Magrans
              lianetm Lianet Magrans
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: