Uploaded image for project: 'Geode'
  1. Geode
  2. GEODE-7012

Distributed deadlock with StartupMessages if executor pools get full

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 1.10.0
    • Fix Version/s: 1.10.0
    • Component/s: None
    • Labels:
      None

      Description

      We hit a distributed deadlock in one of our tests where two members are hung sending startup messages to each other.

      It turns out that until a member gets a response to a StartupMessage, it is in a state where it blocks all outgoing messages. At the same time, the member is receiving an attempting to respond to other messages, but those responses get blocked. If too many messages come in before the StartupResponseMessage, this ends up filling up the ClusterDistributionManager.highPriorityPool.

      If two members are trying to start up at the same time, and they both fill up the highPriorityPool, they both will fail to process each other's StartupMessage, because that message is executed in the same pool.

        Attachments

          Activity

            People

            • Assignee:
              eburghardt Ernest Burghardt
              Reporter:
              upthewaterspout Dan Smith
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0h
                0h
                Logged:
                Time Spent - 1h 10m
                1h 10m