Uploaded image for project: 'Geode'
  1. Geode
  2. GEODE-5560

member becomes coordinator but then stops when it receives a view

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 1.0.0-incubating, 1.1.0, 1.1.1, 1.2.1, 1.3.0, 1.4.0, 1.5.0, 1.6.0
    • 1.7.0
    • membership

    Description

      In a test run that aggressively shuts down and restarts locators I saw a member become the membership coordinator but then receive a new view from the old coordinator. This caused it to shut down its view-creator thread and give up the role of coordinator. It stayed in this state for over 5 minutes until the test was nuked.

      [info 2018/08/07 23:21:06.655 PDT peerZoneDgemfire2_host1_28017 <Pooled High Priority Message Processor 21> tid=0x102] This member is becoming the membership coordinator with address rs-FullRegression08042427a0i3large-hydra-client-104(peerZoneDgemfire2_host1_28017:28017)<ec><v8>:1038
      
      [info 2018/08/07 23:21:06.660 PDT peerZoneDgemfire2_host1_28017 <Pooled High Priority Message Processor 21> tid=0x102] ViewCreator starting on:rs-FullRegression08042427a0i3large-hydra-client-104(peerZoneDgemfire2_host1_28017:28017)<ec><v8>:1038
      
      [info 2018/08/07 23:21:06.696 PDT peerZoneDgemfire2_host1_28017 <Pooled High Priority Message Processor 21> tid=0x102] Member at rs-FullRegression08042427a0i3large-hydra-client-104(peerZoneBgemfire1_host1_27853:27853)<ec><v3>:1030 gracefully left the distributed cache: shutdown message received
      
      [info 2018/08/07 23:21:06.726 PDT peerZoneDgemfire2_host1_28017 <Geode Membership View Creator> tid=0x323] View Creator thread is starting
      
      [info 2018/08/07 23:21:06.726 PDT peerZoneDgemfire2_host1_28017 <unicast receiver,rs-FullRegression08042427a0i3large-hydra-client-104-51513> tid=0x28] received new view: View[rs-FullRegression08042427a0i3large-hydra-client-104(peerZoneBgemfire1_host1_27853:27853)<ec><v3>:1030|36] members: [rs-FullRegression08042427a0i3large-hydra-client-104(peerZoneBgemfire1_host1_27853:27853)<ec><v3>:1030{lead}, rs-FullRegression08042427a0i3large-hydra-client-104(peerZoneBgemfire1_host1_27876:27876)<ec><v3>:1029, rs-FullRegression08042427a0i3large-hydra-client-104(peerZoneCgemfire1_host1_27947:27947)<ec><v5>:1033, rs-FullRegression08042427a0i3large-hydra-client-104(peerZoneCgemfire1_host1_27932:27932)<ec><v6>:1034, rs-FullRegression08042427a0i3large-hydra-client-104(peerZoneCgemfire2_host1_27970:27970)<ec><v6>:1036, rs-FullRegression08042427a0i3large-hydra-client-104(peerZoneCgemfire2_host1_27959:27959)<ec><v6>:1035, rs-FullRegression08042427a0i3large-hydra-client-104(peerZoneDgemfire1_host1_27985:27985)<ec><v7>:1037, rs-FullRegression08042427a0i3large-hydra-client-104(peerZoneDgemfire2_host1_28017:28017)<ec><v8>:1038, rs-FullRegression08042427a0i3large-hydra-client-104(peerZoneDgemfire2_host1_28033:28033)<ec><v10>:1040]  shutdown: [rs-FullRegression08042427a0i3large-hydra-client-104(peerZoneDgemfire1_host1_28001:28001)<ec><v9>:1039, rs-FullRegression08042427a0i3large-hydra-client-104(peerZoneAgemfire1_host1_27819:27819)<ec><v1>:1025, rs-FullRegression08042427a0i3large-hydra-client-104(peerZoneAgemfire2_host1_27844:27844)<ec><v2>:1026, rs-FullRegression08042427a0i3large-hydra-client-104(peerZoneAgemfire2_host1_27834:27834)<ec><v2>:1027, rs-FullRegression08042427a0i3large-hydra-client-104(peerZoneAgemfire1_host1_27826:27826)<ec><v2>:1028, rs-FullRegression08042427a0i3large-hydra-client-104(peerZoneBgemfire2_host1_27898:27898)<ec><v3>:1031, rs-FullRegression08042427a0i3large-hydra-client-104(peerZoneBgemfire2_host1_27917:27917)<ec><v4>:1032]
      
      [info 2018/08/07 23:21:07.400 PDT peerZoneDgemfire2_host1_28017 <vm_15_thr_71_peerZoneD2_host1_28017> tid=0x311] Connection: shared=false ordered=true failed to connect to peer rs-FullRegression08042427a0i3large-hydra-client-104(peerZoneBgemfire1_host1_27876:27876)<ec><v3>:1029 because: java.net.ConnectException: Connection refused
      
      [warning 2018/08/07 23:21:09.400 PDT peerZoneDgemfire2_host1_28017 <vm_15_thr_71_peerZoneD2_host1_28017> tid=0x311] Connection: Attempting reconnect to peer  rs-FullRegression08042427a0i3large-hydra-client-104(peerZoneBgemfire1_host1_27876:27876)<ec><v3>:1029
      

      The method GMSJoinLeave.installView() needs to perform a check similar to GMSJoinLeave.processLeaveRequest() and not abdicate its role as coordinator if the creator of the view is queued up to be removed from membership.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              bschuchardt Bruce J Schuchardt
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 20m
                  20m