Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Fixed
-
None
-
None
Description
When server2 initializes at the same time as client register CQ while connected to server1, when below sequence happened, server2 will have a wrong idea what cq server1 should serve:
1. server2 creates the partitioned region and sends a CreateRegionMessage to server1
2. server1 process this message and adds server2 to its profile list
3. server1 reply to server2 with a CreateRegionReplyMessage
4. At thee same time server1 does register cq locally and send a REGISTER_CQ message to server2
5. REGISTER_CQ message reaches server2, server2 doesn't have server1's cache profile yet, so it wants to save the message to the "to_be_processed" queue, but get stuck there.
6. meanwhile, the CreateRegionReplyMessage in #3 reaches server2, server2 now has server1's cache profile, processed the message and then processed everything in the "to_be_processed" queue.
7. Now #5 gets unstuck and continues, adds the message to the queue, but the queue is never processed again, so now in server2's viewpoint, server1 is not serving that CQ.