Uploaded image for project: 'ZooKeeper'
  1. ZooKeeper
  2. ZOOKEEPER-3911

Data inconsistency caused by DIFF sync uncommitted log

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: 3.5.4, 3.6.0, 3.4.12, 3.4.13, 3.5.5, 3.5.6, 3.5.7, 3.6.1, 3.5.8
    • Fix Version/s: 3.6.3, 3.7.0
    • Component/s: quorum, server

      Description

      Since version 3.4, the quorum of followers and the leader did not synchronize the files immediately when the synchronization was completed, and the data was not persisted to the files in an instant, and at this time the zk server can provide external access, such as webapp access, if it appears at this time Failure, phantom reading may occur

      There is a example in the link.    [ here example|https://drive.google.com/file/d/1jy3kkVQTDYGb4iV1RaPMBbEWLZZltTQG/view?usp=sharing]

      ----------------mail list----------------

      mail response from hanm@apache.org

      Hi Xun,

      I think this is a bug, your test case is sound to me. Do you mind
      creating a JIRA for this issue?

      Followers should not ACK NEWLEADER without ACK every transaction from the
      DIFF sync. To ACK every transaction, a follower either persists the
      transaction in log, or takes a snapshot before sending the ACK of the
      NEWLEADER (which we did, before ZOOKEEPER-2678 where the snapshot
      optimization was introduced).

      A potential fix I have in mind is to make sure to persist all DIFF sync
      proposals from LEADER (similar to what we are already doing for proposals
      coming between NEWLEADER and UPTODATE). By doing so, when the leader
      receives NEWLEADER ACK from a quorum, it's guaranteed that
      every transaction leader DIFF sync to follower is quorum committed. Thus
      there will not be inconsistent views moving forward. Alternatively we can
      take a snapshot before ACK NEWLEADER but that will be a big performance hit
      for big data trees.

      I am also interested to hear what others think about this.

      On Fri, Aug 28, 2020 at 12:20 AM li xun <274952496@qq.com> wrote:

       

      There is a example in the link, would you understand what I mean?

      https://drive.google.com/file/d/1jy3kkVQTDYGb4iV1RaPMBbEWLZZltTQG/view?usp=sharing

      Since version 3.4, the quorum of followers and the leader did not
      synchronize the files immediately when the synchronization was completed,
      and the data was not persisted to the files in an instant, and at this time
      the zk server can provide external access, such as webapp access, if it
      appears at this time Failure, phantom reading may occur

      2020年8月28日 14:51,Justin Ling Mao <maoling199210191@sina.com> 写道:

       

      @李珣The situation you describe may have conceptual deviations about how

      the consensus protocol works:---> Since the data of the follower when the
      follower uses the DIFF method to synchronize with the leader is still in
      the memory, it has not had time to persist1. The write path is: write
      transaction log(WAL) firstly, after reaching a consensus, then apply to
      memory, other than the opposite.

      ---> but at this time, the latest zxid_n of the leader has not been

      supported by the quorum of the follower. At this time, if a client connects
      to the leader and sees zxid_n,2. If a write has not been supported by the
      quorum, it's not safe to apply to the state machine and the client is not
      able to see this write.

      I guess that your question may be: how the system handles the

      uncommitted logs when leader changes?

       

      ----- Original Message -----
      From: Ted Dunning <ted.dunning@gmail.com>
      To: dev@zookeeper.apache.org
      Subject: Re: May violate the ZAB agreement – version 3.6.1
      Date: 2020-08-28 01:25

      How is it that participant A would have a later zxid than the leader?
      In particular, it seems to me that it should be impossible to have these
      two facts be true:
      1) a transaction has been committed with zxid = z_0. This implies that a
      quorum of the cluster has accepted this transaction and it has been
      committed.
      2) a new leader election nominates a leader with latest zxid < z_0.
      My reasoning is that any new leader election has to involve a quorum and

      at

      least a sufficient number of that quorum must have accepted zxid >= z_0

      and

      therefore would refuse to be part of the quorum (this is a

      contradiction).

      Thus, no leader could be elected with zxid < z_0 if fact (1) is true.
      What you are describing seems to require both of these facts.
      Perhaps I am missing something about your suggested scenario. Could you
      describe what you are thinking in more detail?
      On Thu, Aug 27, 2020 at 2:08 AM 李珣 <274952496@qq.com> wrote:

       

      version 3.6.1
      org.apache.zookeeper.server.quorum.Learner.java line:605
      Suppose there is a situation
      zxid_n is the largest zxid of Participant A (the leader has just resumed
      from downtime). Zxid_n has not been recognized by the quorum. Assuming
      Participant A is elected as the Leader, then if a follower appears to

      use

      DIFF to synchronize data with the Leader, Leader After sending the
      UPTODATE, the leader can already provide external access, but at this

      time,

      the latest zxid_n of the leader has not been supported by the quorum of

      the

      follower. At this time, if a client connects to the leader and sees

      zxid_n,

      then at this time both the leader and the follower are down. For some
      reason, the leader cannot be started, and the follower can start

      normally.

      At this time, a new leader can only be elected from the follower. Since

      the

      data of the follower when the follower uses the DIFF method to

      synchronize

      with the leader is still in the memory, it has not had time to persist,
      then this The newly elected leader does not have the data of zxid_n, but
      before zxid_n has been seen by the client on the old leader, there will

      be

      inconsistencies in the data view.
      Is the above situation possible?

       
       

        Attachments

        1. example.png
          1.83 MB
          lixun

          Issue Links

            Activity

              People

              • Assignee:
                hanm Michael Han
                Reporter:
                fregatte lixun
              • Votes:
                0 Vote for this issue
                Watchers:
                6 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 7h
                  7h