Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
-
None
-
None
Description
I didn't run the code, but I took a look at GroupMetadataManager.addGroup and it looks like we can get a NullPointerException when a group is somehow removed between the groupsCache.putIfNotExists and groupsCache.get lines and someone tries to use the result of the addGroup. One way this can happen is by interleaving GroupMetadataManager.addGroup and GroupMetadataManager.removeGroupsForPartition.
Here's the scenario:
- thread-1 is in the middle of adding a group g which is in the offset topic partition p. thread-1 already hit the groupsCache.putIfNotExists line in GroupMetadataManager.addGroup
- thread-2 is in the middle of migrating all groups for partition p. thread-2 is in GroupMetadataManager.removeGroupsForPartition and called groupsCache.remove("g").
- thread-1 now executes groupsCache.get("g"), which returns null since it's now gone.
- thread-1 now goes back to the GroupCoordinator doJoinGroup with a null GroupMetadata and then tries to do a group synchronized
{...}
, resulting in an NPE.