Details

    • Type: Sub-task
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 3.3.2
    • Fix Version/s: 3.6.0, 3.5.5
    • Component/s: leaderElection
    • Labels:
      None

      Description

      Copying relevant comments:

      Vishal K added a comment - 02/Nov/10 02:09 PM
      Hi Flavio,

      I have a suggestion for changing the blocking IO code in QuorumCnxManager. It keeps the current code structure and requires a small amount of changes. I am not sure if these comments should go in ZOOKEEPER-901. ZOOKEEPER-901 is probably addressing netty as well. Please feel free to close this JIRA if you intend to make all the changes as a part of ZOOKEEPER-901.

      Basically we jusy need to move parts of initiateConnection and receiveConnection to SenderWorker and ReceiveWorker.

      A. Current flow for receiving connection:
      1. accept connection in Listener.run()
      2. receiveConnection()

      • Read remote server's ID
      • Take action based on my ID and remote server's ID (disconnect and reconnect if my ID is > remote server's ID).
      • kill current set of SenderWorker and ReciveWorker threads
      • Start a new pair

      B Current flow for initiating connection:
      1. In connectOne(), connect if not already connected. else return.
      2. send my ID to the remote server
      3. if my ID < remote server disconnect and return
      4. if my ID > remote server

      • kill current set of SenderWorker and ReceiveWorkter threads for the remote server
      • Start a new pair

      Proposed changes:
      Move the code that performs any blocking IO in SenderWorker and ReceiveWorker.

      A. Proposed flow for receiving connection:
      1. accept connection in Listener.run()
      2. receiveConnection()

      • kill current set of SenderWorker and ReciveWorker threads
      • Start a new pair

      Proposed changed to SenderWorker:

      • Read remote server's ID
      • Take action based on my ID and remote server's ID (disconnect and reconnect if my ID is > remote server's ID).
      • Proceed to normal operation

      B Proposed flow for initiating connection:
      1. in connectOne(), return if already connected
      2. Start a new SenderWorker and ReceiveWorker pair
      2. In SenderWorker

      • connect to remote server
      • write my ID
      • if my ID < remote server disconnect and return (shutdown the pair).
      • Proceed to normal operation

      Questions:

      • In QuorumCnxManager, is it necessary to kill the current pair and restart a new one every time we receive a connect request?
      • In receiveConnection we may choose to reject an accepted connection if a thread in
        SenderWorker is in the process of connecting. Otherwise a server with ID <
        remote server may keep sending frequent connect request that will result in the
        remote server closing connections for this peer. But I think we add a delay
        before sending notifications, which might be good enough to prevent this
        problem.

      Let me know what you think about this. I can also help with the implementation.

      Flavio Junqueira added a comment - 03/Nov/10 05:28 PM
      Hi Vishal, I like your proposal, it seems reasonable and not difficult to implement.

      On your questions:

      1. I don't think it is necessary to kill a pair SenderWorker/RecvWorker every time, and I'd certainly support changing it;
      2. I'm not sure where you're suggesting to introduce a delay. In the FLE code, a server sends a new batch of notifications if it changes its vote or if it times out waiting for a new notification. This timeout value increases over time. I was actually thinking that we should reset the timeout value upon receiving a notification. I think this is a bug....

      Given that it is your proposal, I'd be happy to let you take a stab at it and help you out if you need a hand. Does it make sense for you?

        Attachments

        1. ZOOKEEPER-932.patch
          36 kB
          Vishal Kher
        2. ZOOKEEPER-932.patch
          56 kB
          Vishal Kher
        3. ZOOKEEPER-932.patch
          56 kB
          Vishal Kher
        4. ZOOKEEPER-932.patchv3
          71 kB
          Vishal Kher
        5. ZOOKEEPER-932.patchv3
          71 kB
          Vishal Kher

          Activity

            People

            • Assignee:
              vishalmlst Vishal Kher
              Reporter:
              vishalmlst Vishal Kher
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated: