Uploaded image for project: 'Geode'
  1. Geode
  2. GEODE-7569

Hang during StateFlush due to new flipping the containsRegionContentChange on PartitionMessageWithDirectReply

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 1.12.0
    • membership
    • None

    Description

      The recent changes in GEODE-7435 in e3a31e190031f094ac3bd1517722d6bead710418 have caused a distributed deadlock when making a copy of a bucket.

      These changes flipped the value of containsRegionContentChange for PartitionMessageWithDirectReply.

      That flag controls what messages participate in a state flush operation. Now, many new messages are part of a state flush, including messages which trigger bucket creation. This causes the following distributed deadlock:

      1. Member A is waiting for a StateFlush to finish
      2. Member B is stuck in StateStabilizationMessage, waiting for messages to be processed
      3. Member B is in the middle of processing some messages, which is what is holding up the StateStabilizationMessage
      4. Some of those messages are PartitionMessageWithDirectReply messages that end up triggering createBucketAtomically. That method is blocks waiting for bucket creation in Member A to finish.

      Attachments

        Issue Links

          Activity

            People

              echobravo Ernest Burghardt
              upthewaterspout Dan Smith
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 40m
                  40m