[GEODE-7012] Distributed deadlock with StartupMessages if executor pools get full - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: 1.10.0
Fix Version/s: 1.10.0
Component/s: None
Labels:
None

Description

We hit a distributed deadlock in one of our tests where two members are hung sending startup messages to each other.

It turns out that until a member gets a response to a StartupMessage, it is in a state where it blocks all outgoing messages. At the same time, the member is receiving an attempting to respond to other messages, but those responses get blocked. If too many messages come in before the StartupResponseMessage, this ends up filling up the ClusterDistributionManager.highPriorityPool.

If two members are trying to start up at the same time, and they both fill up the highPriorityPool, they both will fail to process each other's StartupMessage, because that message is executed in the same pool.

Attachments

Issue Links

links to

GitHub Pull Request #3844

GitHub Pull Request #3877

Activity

People

Assignee:: Ernest Burghardt

Reporter:: Dan Smith

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 24/Jul/19 23:11

Updated:: 26/Sep/19 18:05

Resolved:: 05/Aug/19 17:56

Time Tracking

Estimated:

Not Specified

Remaining:

Logged:

1h 10m