Uploaded image for project: 'Geode'
  1. Geode
  2. GEODE-6570

processing of cached join request delays view installation

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 1.9.0
    • membership
    • None

    Description

      In a test that kills and restarts locators one of the restarting locators times out trying to join the distributed system.  Logs show that another locator was becoming the membership coordinator and was delayed in sending out a membership view when it processed a different join request for a member that was already in the distributed system.

      locator A gets join request from node 1 and sends a PREPARE

      node 1 sets its identity's view ID using the PREPAREd view

      locator A is killed

      node 1 sends a join request to locator B.  Its identity has a view ID set.

      node 2 sends a join request to locator B and gets a PREPARE

      locator B processes node 1's join request and assigns a new view ID to it

      locator B processes node 2's join request and assigns a new view ID to it

      locator B sends the PREPARE with these two new nodes.  It also has node 1's original ID

      locator B times out waiting for a response from node 1 with the new view ID and declares it crashed.  It sends out a new PREPARE w/o that address.

      node 2 gives up waiting

      locator B gets no response from node 2 and declares it crashed, sends out a new PREPARE without node 2 and succeeds.

      Here are log snippets showing the problem.  Process 616 has a JoinRequest queued when this locator becomes coordinator.  The JoinRequest ID has v46 already in it, showing that a PREPARE has already been sent with this member in it.

      The locator then creates a new View that has process 616's ID in it twice - once with v46 and once with v60

      locatorgemfire_2_2_29835/system.log: [fine 2019/03/27 22:22:22.817 PDT locatorgemfire_2_2_host2_29835 <Geode Membership View Creator> tid=0xba] processing request JoinRequestMessage(rs-GEM-2463-1622a0i32xlarge-hydra-client-17(peergemfire_2_1_host2_616:616)<ec><v46>:41004) failureDetectionPort:43747
      locatorgemfire_2_2_29835/system.log: [fine 2019/03/27 22:22:22.817 PDT locatorgemfire_2_2_host2_29835 <Geode Membership View Creator> tid=0xba] processing request JoinRequestMessage(rs-GEM-2463-1622a0i32xlarge-hydra-client-17(locatorgemfire_2_3_host2_746:746:locator)<ec>:41002) failureDetectionPort:52188
      
      locatorgemfire_2_2_29835/system.log: [info 2019/03/27 22:22:22.818 PDT locatorgemfire_2_2_host2_29835 <Geode Membership View Creator> tid=0xba] preparing new view View[rs-GEM-2463-1622a0i32xlarge-hydra-client-17(locatorgemfire_2_2_host2_29835:29835:locator)<ec><v24>:41001|60] members: [rs-GEM-2463-1622a0i32xlarge-hydra-client-17(locatorgemfire_2_2_host2_29835:29835:locator)<ec><v24>:41001, rs-GEM-2463-1622a0i32xlarge-hydra-client-17(peergemfire_2_2_host2_30052:30052)<ec><v25>:41007{lead}, rs-GEM-2463-1622a0i32xlarge-hydra-client-17(locatorgemfire_2_4_host2_31300:31300:locator)<ec><v29>:41003, rs-GEM-2463-1622a0i32xlarge-hydra-client-17(locatorgemfire_2_1_host2_31671:31671:locator)<ec><v41>:41000, rs-GEM-2463-1622a0i32xlarge-hydra-client-17(peergemfire_2_2_host2_31856:31856)<ec><v42>:41006, rs-GEM-2463-1622a0i32xlarge-hydra-client-17(peergemfire_2_1_host2_32560:32560)<ec><v44>:41005, rs-GEM-2463-1622a0i32xlarge-hydra-client-17(peergemfire_2_1_host2_616:616)<ec><v46>:41004, rs-GEM-2463-1622a0i32xlarge-hydra-client-17(peergemfire_2_1_host2_616:616)<ec><v60>:41004, rs-GEM-2463-1622a0i32xlarge-hydra-client-17(locatorgemfire_2_3_host2_746:746:locator)<ec><v60>:41002]
      
      

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            bschuchardt Bruce J Schuchardt
            bschuchardt Bruce J Schuchardt
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment