Description
In some rare cases, the StickyAssignor will fail to balance an assignment without violating stickiness despite a balanced and sticky assignment being possible. The implications of this for cooperative rebalancing are that an unnecessary additional rebalance will be triggered.
This was seen to happen for example when each consumer is subscribed to some random subset of all topics and all their subscriptions change to a different random subset, as occurs in AbstractStickyAssignorTest#testReassignmentWithRandomSubscriptionsAndChanges.
The initial assignment after the random subscription change obviously involved migrating partitions, so following the cooperative protocol those partitions are removed from the balanced first assignment, and a second rebalance is triggered. In some cases, during the second rebalance the assignor was unable to reach a balanced assignment without migrating a few partitions, even though one must have been possible (since the first assignment was balanced). A third rebalance was needed to reach a stable balanced state.
Under the conditions in the previously mentioned test (between 20-40 consumers, 10-20 topics (with 0-20 partitions) this third rebalance was required roughly 30% of the time. Some initial improvements to the sticky assignment logic reduced this to under 15%, but we should consider closing this gap and optimizing the cooperative sticky assignment