Uploaded image for project: 'Geode'
  1. Geode
  2. GEODE-2534

concurrently started locators fail to create a unified system

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 1.2.0
    • locator
    • None

    Description

      During startup a locator responded to a "find coordinator" request before knowing its own identity. This caused it to respond differently to subsequent requests during concurrent locator startup. As a result it created its own distributed system while the locator that received the initial response created a different one.

      [fine 2017/02/23 15:32:02.031 UTC locator-default-0 <main> tid=0x1] LogWriter is created.
      
      [fine 2017/02/23 15:32:02.031 UTC locator-default-0 <main> tid=0x1] Responding to a property change event. Property name is config.
      
      [info 2017/02/23 15:32:02.886 UTC locator-default-0 <main> tid=0x1] Peer locator is connecting to local membership services
      
      [fine 2017/02/23 15:32:02.887 UTC locator-default-0 <locator request thread[1]> tid=0x14] Peer locator: coordinator from registrations is 10.85.100.166(locator-default-2:8706:locator)<ec>:49152
      
      [fine 2017/02/23 15:32:02.887 UTC locator-default-0 <locator request thread[1]> tid=0x14] Peer locator returning FindCoordinatorResponse(coordinator=10.85.100.166(locator-default-2:8706:locator)<ec>:49152, fromView=false, viewId=nul, registrants=1, senderId=null, network partition detection enabled=true, locators preferred as coordinators=true)
      
      [info 2017/02/23 15:32:02.891 UTC locator-default-0 <main> tid=0x1] Starting membership services
      
      [fine 2017/02/23 15:32:02.891 UTC locator-default-0 <main> tid=0x1] starting Authenticator
      
      [fine 2017/02/23 15:32:02.891 UTC locator-default-0 <main> tid=0x1] starting Messenger
      
      ...
      
      [fine 2017/02/23 15:32:03.369 UTC locator-default-0 <main> tid=0x1] All membership services have been started
      
      [fine 2017/02/23 15:32:03.369 UTC locator-default-0 <main> tid=0x1] join timeout is set to 24000
      
      [fine 2017/02/23 15:32:03.370 UTC locator-default-0 <main> tid=0x1] searching for the membership coordinator
      
      [fine 2017/02/23 15:32:03.370 UTC locator-default-0 <main> tid=0x1] sending FindCoordinatorRequest(memberID=10.85.100.165(locator-default-0:8873:locator)<ec>:49152, rejected=[], lastViewId=-1) to [/10.85.100.165:55221, /10.85.100.166:55221, /10.85.100.167:55221]
      
      ...
      
      [fine 2017/02/23 15:32:03.376 UTC locator-default-0 <locator request thread[1]> tid=0x14] Peer locator: coordinator from registrations is 10.85.100.165(locator-default-0:8873:locator)<ec>:49152
      
      [fine 2017/02/23 15:32:03.376 UTC locator-default-0 <locator request thread[1]> tid=0x14] Peer locator returning FindCoordinatorResponse(coordinator=10.85.100.165(locator-default-0:8873:locator)<ec>:49152, fromView=false, viewId=nul, registrants=2, senderId=10.85.100.165(locator-default-0:8873:locator)<ec>:49152, network partition detection enabled=true, locators preferred as coordinators=true)
      

      The locator should not respond to requests to find the coordinator before it knows its own identity.

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            Unassigned Unassigned
            bschuchardt Bruce J Schuchardt
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment