Uploaded image for project: 'Kafka'
  1. Kafka
  2. KAFKA-10633

Constant probing rebalances in Streams 2.6

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.6.0
    • 2.6.1
    • streams
    • None

    Description

      We are seeing a few issues with the new rebalancing behavior in Streams 2.6. This ticket is for constant probing rebalances on one StreamThread, but I'll mention the other issues, as they may be related.

      First, when we redeploy the application we see tasks being moved, even though the task assignment was stable before redeploying. We would expect to see tasks assigned back to the same instances and no movement. The application is in EC2, with persistent EBS volumes, and we use static group membership to avoid rebalancing. To redeploy the app we terminate all EC2 instances. The new instances will reattach the EBS volumes and use the same group member id.

      After redeploying, we sometimes see the group leader go into a tight probing rebalance loop. This doesn't happen immediately, it could be several hours later. Because the redeploy caused task movement, we see expected probing rebalances every 10 minutes. But, then one thread will go into a tight loop logging messages like "Triggering the followup rebalance scheduled for 1603323868771 ms.", handling the partition assignment (which doesn't change), then "Requested to schedule probing rebalance for 1603323868771 ms." This repeats several times a second until the app is restarted again. I'll attach a log export from one such incident.

      Attachments

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            Unassigned Unassigned
            thebearmayor Bradley Peterson
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Issue deployment