Uploaded image for project: 'Geode'
  1. Geode
  2. GEODE-2534

concurrently started locators fail to create a unified system

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 1.2.0
    • locator
    • None

    Description

      During startup a locator responded to a "find coordinator" request before knowing its own identity. This caused it to respond differently to subsequent requests during concurrent locator startup. As a result it created its own distributed system while the locator that received the initial response created a different one.

      [fine 2017/02/23 15:32:02.031 UTC locator-default-0 <main> tid=0x1] LogWriter is created.
      
      [fine 2017/02/23 15:32:02.031 UTC locator-default-0 <main> tid=0x1] Responding to a property change event. Property name is config.
      
      [info 2017/02/23 15:32:02.886 UTC locator-default-0 <main> tid=0x1] Peer locator is connecting to local membership services
      
      [fine 2017/02/23 15:32:02.887 UTC locator-default-0 <locator request thread[1]> tid=0x14] Peer locator: coordinator from registrations is 10.85.100.166(locator-default-2:8706:locator)<ec>:49152
      
      [fine 2017/02/23 15:32:02.887 UTC locator-default-0 <locator request thread[1]> tid=0x14] Peer locator returning FindCoordinatorResponse(coordinator=10.85.100.166(locator-default-2:8706:locator)<ec>:49152, fromView=false, viewId=nul, registrants=1, senderId=null, network partition detection enabled=true, locators preferred as coordinators=true)
      
      [info 2017/02/23 15:32:02.891 UTC locator-default-0 <main> tid=0x1] Starting membership services
      
      [fine 2017/02/23 15:32:02.891 UTC locator-default-0 <main> tid=0x1] starting Authenticator
      
      [fine 2017/02/23 15:32:02.891 UTC locator-default-0 <main> tid=0x1] starting Messenger
      
      ...
      
      [fine 2017/02/23 15:32:03.369 UTC locator-default-0 <main> tid=0x1] All membership services have been started
      
      [fine 2017/02/23 15:32:03.369 UTC locator-default-0 <main> tid=0x1] join timeout is set to 24000
      
      [fine 2017/02/23 15:32:03.370 UTC locator-default-0 <main> tid=0x1] searching for the membership coordinator
      
      [fine 2017/02/23 15:32:03.370 UTC locator-default-0 <main> tid=0x1] sending FindCoordinatorRequest(memberID=10.85.100.165(locator-default-0:8873:locator)<ec>:49152, rejected=[], lastViewId=-1) to [/10.85.100.165:55221, /10.85.100.166:55221, /10.85.100.167:55221]
      
      ...
      
      [fine 2017/02/23 15:32:03.376 UTC locator-default-0 <locator request thread[1]> tid=0x14] Peer locator: coordinator from registrations is 10.85.100.165(locator-default-0:8873:locator)<ec>:49152
      
      [fine 2017/02/23 15:32:03.376 UTC locator-default-0 <locator request thread[1]> tid=0x14] Peer locator returning FindCoordinatorResponse(coordinator=10.85.100.165(locator-default-0:8873:locator)<ec>:49152, fromView=false, viewId=nul, registrants=2, senderId=10.85.100.165(locator-default-0:8873:locator)<ec>:49152, network partition detection enabled=true, locators preferred as coordinators=true)
      

      The locator should not respond to requests to find the coordinator before it knows its own identity.

      Attachments

        Activity

          People

            Unassigned Unassigned
            bschuchardt Bruce J Schuchardt
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: