Uploaded image for project: 'Geode'
  1. Geode
  2. GEODE-5056

ParallelGatewaySenderOperationsDUnitTest.testParallelPropagationSenderStartAfterStop_Scenario2 intermittently fail

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 1.6.0
    • wan

    Description

      After fixe GEODE-4942, I found there's at least one race condition is not covered. 

       

      [vm6] [debug 2018/04/11 16:47:35.189 PDT <PartitionedRegion Message Processor2> tid=110] WAN: On primary bucket 57, setting the seq number as 1357

       

      [vm7] [info 2018/04/11 16:47:35.150 PDT <RMI TCP Connection(1)-10.118.19.25> tid=19] Started  ParallelGatewaySender{id=ln,remoteDsId=2,isRunning =true}

       

      [vm7] [debug 2018/04/11 16:47:35.189 PDT <P2P message reader for 10.118.19.25(27489)<v3>:32781 shared ordered uid=7 port=59148> tid=95] WAN: On secondary bucket 57, setting the seq number as 1357

      [vm7] [debug 2018/04/11 16:47:35.190 PDT <P2P message reader for 10.118.19.25(27489)<v3>:32781 shared ordered uid=7 port=59148> tid=95] Key : ----> 1357

      [vm6] [debug 2018/04/11 16:47:35.190 PDT <PartitionedRegion Message Processor2> tid=110] register dropped event for primary queue. BucketId is 57, shadowKey is 1357, prQ is /ln_PARALLEL_GATEWAY_SENDER_QUEUE

       

      ----- Note: vm6's sender is restarted and cleanup the map, before the

      QueueRemvalMessage is sent out for the map.

      [vm6] [info 2018/04/11 16:47:35.249 PDT <RMI TCP Connection(1)-10.118.19.25> tid=19] Started  ParallelGatewaySender{id=ln,remoteDsId=2,isRunning =true}

      [vm6] [debug 2018/04/11 16:47:35.437 PDT <BatchRemovalThread for GatewaySender_ln_0> tid=118] BatchRemovalThread about to query the batch removal map {/ln_PARALLEL_GATEWAY_SENDER_QUEUE={96=[1396], 2=[1402], 83=[1383], 6=[1406], 71=[1371], 87=[1387], 73=[1373], 90=[1390], 77=[1377], 94=[1394]}}

      [vm6] [debug 2018/04/11 16:47:35.753 PDT <BatchRemovalThread for GatewaySender_ln_0> tid=118] BatchRemovalThread about to query the batch removal map {/ln_PARALLEL_GATEWAY_SENDER_QUEUE={49=[1449], 65=[1465], 83=[1483], 53=[1453], 71=[1471], 87=[1487]57=[1457], 73=[1473], 77=[1477], 62=[1462]}}

      ---- shadowKey 1457 was created after the sender is restarted

       

      [vm6] [debug 2018/04/11 16:47:35.438 PDT <BatchRemovalThread for GatewaySender_ln_0> tid=118] Sending (ParallelQueueRemovalMessage@2344969b processorId=0 sender=10.118.19.25(27489)<v3>:32781) to 3 peers ([10.118.19.25(27492)<v4>:32783@4(GEODE 1.6.0), 10.118.19.25(27485)<v2>:32779@1(GEODE 1.6.0), 10.118.19.25(27482)<v1>:32778@2(GEODE 1.6.0)]) via tcp/ip

      [vm7] [debug 2018/04/11 16:47:35.439 PDT <P2P message reader for 10.118.19.25(27489)<v3>:32781 shared unordered uid=4 port=59119> tid=52] Received message 'ParallelQueueRemovalMessage@11583f5b processorId=0 sender=10.118.19.25(27489)<v3>:32781' from <10.118.19.25(27489)<v3>:32781>

       

      i.e. the dropped key was in the map, but before sending a QueueRemovalMessage the sender is closed and cleared the map. 

      Attachments

        Issue Links

          Activity

            People

              zhouxj Xiaojian Zhou
              zhouxj Xiaojian Zhou
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 20m
                  20m