Details

    • Type: Sub-task Sub-task
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 3.4.0
    • Component/s: server
    • Labels:
      None
    • Release Note:
      Committed revision 1135382.

      Description

      the leader and follower code need to be modified to correctly handle and log epoch changes

      1. sample.diff
        3 kB
        Vishal Kher
      2. ZOOKEEPER-1081.patch
        29 kB
        Benjamin Reed

        Activity

        Benjamin Reed created issue -
        Benjamin Reed made changes -
        Field Original Value New Value
        Fix Version/s 3.4.0 [ 12314469 ]
        Benjamin Reed made changes -
        Component/s server [ 12312382 ]
        Hide
        Benjamin Reed added a comment -

        this is a patch that addresses the newleader logging issue, and brings the implementation in line with the theoretically proven protocol.

        there is a problem with hierarchal quorums that i haven't looked at. the backwards compatibility code is in, but i haven't tested it. i'm still trying to make leader election mod in a backward compatible way.

        the code LearnerHandler code is pretty messy now and i'm torn between cleaning it up or leaving until later. my concerns about cleaning is that 1) it may be hard to compare past logic with current if we need to in the near future and 2) it may introduce new bugs. on the pro side, it is crying out for the code cleanup, and this is usually when we do it.

        Show
        Benjamin Reed added a comment - this is a patch that addresses the newleader logging issue, and brings the implementation in line with the theoretically proven protocol. there is a problem with hierarchal quorums that i haven't looked at. the backwards compatibility code is in, but i haven't tested it. i'm still trying to make leader election mod in a backward compatible way. the code LearnerHandler code is pretty messy now and i'm torn between cleaning it up or leaving until later. my concerns about cleaning is that 1) it may be hard to compare past logic with current if we need to in the near future and 2) it may introduce new bugs. on the pro side, it is crying out for the code cleanup, and this is usually when we do it.
        Benjamin Reed made changes -
        Attachment ZOOKEEPER-1081.patch [ 12481323 ]
        Hide
        Benjamin Reed added a comment -

        oh btw, the test case was super hard to write. the scenario is easy to describe, but i found it quite hard to implement. if someone (vishal? sees a better way to do the test, i would be really happy to try it.

        Show
        Benjamin Reed added a comment - oh btw, the test case was super hard to write. the scenario is easy to describe, but i found it quite hard to implement. if someone (vishal? sees a better way to do the test, i would be really happy to try it.
        Hide
        Vishal Kher added a comment -

        Hi Ben,

        I have attached the diff that I used to repro the bug.

        I inserted a failure in ProposalRequestProcessor.java. Everytime I read file /etc/zookeeper/debug.property. I check if the debug property is set. If it is, then I fail (I did shutdown -h to avoid some timing issues specific to our application) the leader after call to syncProcessor.processRequest(request). This ensures that the transaction is logged but not sent to followers. Then, when we restart the old leader we can check if it still has the old data.

        Do you think this helps? I think we need to start injecting faults in the code itself. So having a framwork that would do something similar to the attached diff would be useful.

        Show
        Vishal Kher added a comment - Hi Ben, I have attached the diff that I used to repro the bug. I inserted a failure in ProposalRequestProcessor.java. Everytime I read file /etc/zookeeper/debug.property. I check if the debug property is set. If it is, then I fail (I did shutdown -h to avoid some timing issues specific to our application) the leader after call to syncProcessor.processRequest(request). This ensures that the transaction is logged but not sent to followers. Then, when we restart the old leader we can check if it still has the old data. Do you think this helps? I think we need to start injecting faults in the code itself. So having a framwork that would do something similar to the attached diff would be useful.
        Vishal Kher made changes -
        Attachment sample.diff [ 12481380 ]
        Hide
        Benjamin Reed added a comment -

        yeah, we really should be injecting faults. i think there are nice fault injectors that you can use without modifying your code right? have you (or anyone else listening) ever tried them?

        Show
        Benjamin Reed added a comment - yeah, we really should be injecting faults. i think there are nice fault injectors that you can use without modifying your code right? have you (or anyone else listening) ever tried them?
        Hide
        Mahadev konar added a comment -

        Would mockito be helpful?

        Show
        Mahadev konar added a comment - Would mockito be helpful?
        Hide
        Camille Fournier added a comment -

        I dunno that mockito would do enough, but some of my colleagues had good luck with MultiThreadedTestCase:

        http://code.google.com/p/multithreadedtc/

        I didn't use it myself but it looked pretty cool.

        Show
        Camille Fournier added a comment - I dunno that mockito would do enough, but some of my colleagues had good luck with MultiThreadedTestCase: http://code.google.com/p/multithreadedtc/ I didn't use it myself but it looked pretty cool.
        Benjamin Reed made changes -
        Status Open [ 1 ] Resolved [ 5 ]
        Release Note Committed revision 1135382.
        Resolution Fixed [ 1 ]
        Hide
        Hudson added a comment -

        Integrated in ZooKeeper-trunk #1215 (See https://builds.apache.org/job/ZooKeeper-trunk/1215/)

        Show
        Hudson added a comment - Integrated in ZooKeeper-trunk #1215 (See https://builds.apache.org/job/ZooKeeper-trunk/1215/ )
        Patrick Hunt made changes -
        Assignee Benjamin Reed [ breed ]
        Mahadev konar made changes -
        Status Resolved [ 5 ] Closed [ 6 ]

          People

          • Assignee:
            Benjamin Reed
            Reporter:
            Benjamin Reed
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development