Uploaded image for project: 'Geode'
  1. Geode
  2. GEODE-7055

Deadlock with StartupMessages if P2P error requiring a sendFailureReply

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.10.0
    • Component/s: membership
    • Labels:
      None

      Description

      An error/exception occurs on the P2P message thread, which requires a FailureReply be sent, but the StartupResponse message has not been recieved (on the P2P message thread) the failure reply will DEADLOCK on the call to
      org.apache.geode.distributed.internal.ClusterDistributionManager.waitUntilReadyToSendMsgs
      as the StartupOperation is already in a waitForReplies() for the StartupResponse

      // below is an example of an Exception triggering the DEADLOCK
      

       

      [fatal 2019/08/05 22:47:06.462 UTC <P2P message reader for 10.0.8.10(cacheserver-28663bad-c0b0-41f7-b723-5a2425fa54ff:1)<v5>:56152(version:GEODE 1.9.0) shared unordered uid=63 port=49194> tid=0x25] Error deserializing message
      java.lang.ClassNotFoundException: org.apache.geode.modules.util.BootstrappingFunction
              at org.apache.geode.internal.ClassPathLoader.forName(ClassPathLoader.java:180)
              at org.apache.geode.internal.InternalDataSerializer.getCachedClass(InternalDataSerializer.java:3274)
              at org.apache.geode.DataSerializer.readClass(DataSerializer.java:264)
              at org.apache.geode.internal.InternalDataSerializer.readDataSerializable(InternalDataSerializer.java:2398)
              at org.apache.geode.internal.InternalDataSerializer.basicReadObject(InternalDataSerializer.java:2673)
              at org.apache.geode.DataSerializer.readObject(DataSerializer.java:2968)
              at org.apache.geode.internal.cache.MemberFunctionStreamingMessage.fromData(MemberFunctionStreamingMessage.java:277)
              at org.apache.geode.internal.InternalDataSerializer.invokeFromData(InternalDataSerializer.java:2372)
              at org.apache.geode.internal.DSFIDFactory.create(DSFIDFactory.java:997)
              at org.apache.geode.internal.InternalDataSerializer.readDSFID(InternalDataSerializer.java:2516)
              at org.apache.geode.internal.InternalDataSerializer.readDSFID(InternalDataSerializer.java:2528)
              at org.apache.geode.internal.tcp.Connection.readMessage(Connection.java:3111)
              at org.apache.geode.internal.tcp.Connection.processInputBuffer(Connection.java:2920)
              at org.apache.geode.internal.tcp.Connection.readMessages(Connection.java:1745)
              at org.apache.geode.internal.tcp.Connection.run(Connection.java:1577)
              at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
              at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
              at java.lang.Thread.run(Thread.java:748)
      
              "P2P message reader for 10.0.8.10(cacheserver-28663bad-c0b0-41f7-b723-5a2425fa54ff:1)<v5>:56152(version:GEODE 1.9.0) shared unordered uid=63 port=49194" #37 daemon prio=10 os_prio=0 tid=0x00007f4a108bb800 nid=0x2a in Object.wait() [0x00007f4a0dca7000]
         java.lang.Thread.State: WAITING (on object monitor)
              at java.lang.Object.wait(Native Method)
              - waiting on <0x00000006d39c4538> (a java.lang.Object)
              at java.lang.Object.wait(Object.java:502)
              at org.apache.geode.distributed.internal.ClusterDistributionManager.waitUntilReadyToSendMsgs(ClusterDistributionManager.java:1212)
              - locked <0x00000006d39c4538> (a java.lang.Object)
              at org.apache.geode.distributed.internal.ClusterDistributionManager.sendMessage(ClusterDistributionManager.java:2816)
              at org.apache.geode.distributed.internal.ClusterDistributionManager.putOutgoing(ClusterDistributionManager.java:1528)
              at org.apache.geode.distributed.internal.ReplyMessage.send(ReplyMessage.java:113)
              at org.apache.geode.distributed.internal.ReplyMessage.send(ReplyMessage.java:86)
              at org.apache.geode.internal.tcp.Connection.sendFailureReply(Connection.java:1954)
              at org.apache.geode.internal.tcp.Connection.readMessage(Connection.java:3162)
              at org.apache.geode.internal.tcp.Connection.processInputBuffer(Connection.java:2920)
              at org.apache.geode.internal.tcp.Connection.readMessages(Connection.java:1745)
              at org.apache.geode.internal.tcp.Connection.run(Connection.java:1577)
              at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
              at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
              at java.lang.Thread.run(Thread.java:748)
      

       

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                upthewaterspout Dan Smith
                Reporter:
                eburghardt Ernest Burghardt
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 0.5h
                  0.5h