Uploaded image for project: 'Geode'
  1. Geode
  2. GEODE-5056

ParallelGatewaySenderOperationsDUnitTest.testParallelPropagationSenderStartAfterStop_Scenario2 intermittently fail

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 1.6.0
    • wan

    Description

      After fixe GEODE-4942, I found there's at least one race condition is not covered. 

       

      [vm6] [debug 2018/04/11 16:47:35.189 PDT <PartitionedRegion Message Processor2> tid=110] WAN: On primary bucket 57, setting the seq number as 1357

       

      [vm7] [info 2018/04/11 16:47:35.150 PDT <RMI TCP Connection(1)-10.118.19.25> tid=19] Started  ParallelGatewaySender{id=ln,remoteDsId=2,isRunning =true}

       

      [vm7] [debug 2018/04/11 16:47:35.189 PDT <P2P message reader for 10.118.19.25(27489)<v3>:32781 shared ordered uid=7 port=59148> tid=95] WAN: On secondary bucket 57, setting the seq number as 1357

      [vm7] [debug 2018/04/11 16:47:35.190 PDT <P2P message reader for 10.118.19.25(27489)<v3>:32781 shared ordered uid=7 port=59148> tid=95] Key : ----> 1357

      [vm6] [debug 2018/04/11 16:47:35.190 PDT <PartitionedRegion Message Processor2> tid=110] register dropped event for primary queue. BucketId is 57, shadowKey is 1357, prQ is /ln_PARALLEL_GATEWAY_SENDER_QUEUE

       

      ----- Note: vm6's sender is restarted and cleanup the map, before the

      QueueRemvalMessage is sent out for the map.

      [vm6] [info 2018/04/11 16:47:35.249 PDT <RMI TCP Connection(1)-10.118.19.25> tid=19] Started  ParallelGatewaySender{id=ln,remoteDsId=2,isRunning =true}

      [vm6] [debug 2018/04/11 16:47:35.437 PDT <BatchRemovalThread for GatewaySender_ln_0> tid=118] BatchRemovalThread about to query the batch removal map {/ln_PARALLEL_GATEWAY_SENDER_QUEUE={96=[1396], 2=[1402], 83=[1383], 6=[1406], 71=[1371], 87=[1387], 73=[1373], 90=[1390], 77=[1377], 94=[1394]}}

      [vm6] [debug 2018/04/11 16:47:35.753 PDT <BatchRemovalThread for GatewaySender_ln_0> tid=118] BatchRemovalThread about to query the batch removal map {/ln_PARALLEL_GATEWAY_SENDER_QUEUE={49=[1449], 65=[1465], 83=[1483], 53=[1453], 71=[1471], 87=[1487]57=[1457], 73=[1473], 77=[1477], 62=[1462]}}

      ---- shadowKey 1457 was created after the sender is restarted

       

      [vm6] [debug 2018/04/11 16:47:35.438 PDT <BatchRemovalThread for GatewaySender_ln_0> tid=118] Sending (ParallelQueueRemovalMessage@2344969b processorId=0 sender=10.118.19.25(27489)<v3>:32781) to 3 peers ([10.118.19.25(27492)<v4>:32783@4(GEODE 1.6.0), 10.118.19.25(27485)<v2>:32779@1(GEODE 1.6.0), 10.118.19.25(27482)<v1>:32778@2(GEODE 1.6.0)]) via tcp/ip

      [vm7] [debug 2018/04/11 16:47:35.439 PDT <P2P message reader for 10.118.19.25(27489)<v3>:32781 shared unordered uid=4 port=59119> tid=52] Received message 'ParallelQueueRemovalMessage@11583f5b processorId=0 sender=10.118.19.25(27489)<v3>:32781' from <10.118.19.25(27489)<v3>:32781>

       

      i.e. the dropped key was in the map, but before sending a QueueRemovalMessage the sender is closed and cleared the map. 

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            zhouxj Xiaojian Zhou
            zhouxj Xiaojian Zhou
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0h
                0h
                Logged:
                Time Spent - 20m
                20m

                Slack

                  Issue deployment