Uploaded image for project: 'Apache Helix'
  1. Apache Helix
  2. HELIX-652

Double assignment , when participant is not able to establish connection with zookeeper quorum

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 0.7.1, 0.6.4
    • None
    • helix-core
    • None

    Description

      Double assignment , when participant is not able to establish connection with zookeeper quorum

      Following is the set up.
      Version(s) : Helix: 0.7.1
      Zookeeper:3.3.4

      • State Model: OnlineOffline
      • Controller (leader elected from one of the cluster nodes)
      • Single resources with partitions.
      • Full auto rebalancer

      -Zookeeper quorum (3 nodes)

      When one participant loses the zookeeper connection (It’s not able to connect to any of the zookeepers , a typical occurrence we faced was switch failure from that rack or a network switch failure on a node)

      ---- > The partition (P1) for which this participant (say Node N1) is online is still maintained

      Meanwhile since it loses the ephemeral node in zookeeper , the rebalancer gets triggered and it reallocates the partition (P1) to another participant node (say Node N2) to become online @ time T1

      ---- > After this both N1 and N2 are acting as online for the same Partition (P1)

      But as soon as participant in (say Node N1) is able to re-establish the zookeeper connection @ time T2
      ---- > Reset gets called on the partition in participant (say Node N1)

      Double assignment:
      The question here is this an expected behavior that both nodes N1 and N2 could be online for the same Partition (P1) between time (T1-T2)

      Attachments

        Activity

          People

            Unassigned Unassigned
            subramanian subramanian raghunathan
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: