ZooKeeper
  1. ZooKeeper
  2. ZOOKEEPER-1465

Cluster availability following new leader election takes a long time with large datasets - is correlated to dataset size

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Critical Critical
    • Resolution: Fixed
    • Affects Version/s: 3.4.3
    • Fix Version/s: 3.4.4, 3.5.0
    • Component/s: leaderElection
    • Labels:
      None

      Description

      When re-electing a new leader of a cluster, it takes a long time for the cluster to become available if the dataset is large

      Test Data
      ----------
      650mb snapshot size
      20k nodes of varied size
      3 member cluster

      On 3.4.x branch (http://svn.apache.org/repos/asf/zookeeper/branches/branch-3.4?r=1244779)
      ------------------------------------------------------------------------------------------

      Takes 3-4 minutes to bring up a cluster from cold
      Takes 40-50 secs to recover from a leader failure
      Takes 10 secs for a new follower to join the cluster

      Using the 3.3.5 release on the same hardware with the same dataset
      -----------------------------------------------------------------

      Takes 10-20 secs to bring up a cluster from cold
      Takes 10 secs to recover from a leader failure
      Takes 10 secs for a new follower to join the cluster

      I can see from the logs in 3.4.x that once a new leader is elected, it pushes a new snapshot to each of the followers who need to save it before they ack the leader who can then mark the cluster as available.

      The kit being used is a low spec vm so the times taken are not relevant per se - more the fact that a snapshot is always sent even through there is no difference between the persisted state on each peer.
      No data is being added to the cluster while the peers are being restarted.

      1. ZOOKEEPER-1465.patch
        2 kB
        Camille Fournier
      2. ZOOKEEPER-1465.patch
        6 kB
        Thawan Kooburat
      3. ZOOKEEPER-1465.patch
        13 kB
        Flavio Junqueira
      4. ZOOKEEPER-1465.patch
        13 kB
        Flavio Junqueira
      5. ZOOKEEPER-1465_br34.patch
        14 kB
        Camille Fournier

        Activity

          People

          • Assignee:
            Camille Fournier
            Reporter:
            Alex Gvozdenovic
          • Votes:
            0 Vote for this issue
            Watchers:
            12 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development