Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Blocker Blocker
    • Resolution: Fixed
    • Affects Version/s: 3.2.0
    • Fix Version/s: 3.3.0
    • Component/s: quorum, server
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      I was doing some fault injection testing of 3.2.1 with ZOOKEEPER-508 patch applied and noticed that after some time the ensemble failed to re-elect a leader.

      See the attached log files - 5 member ensemble. typically 5 is the leader

      Notice that after 16:23:50,525 no quorum is formed, even after 20 minutes elapses w/no quorum

      environment:

      I was doing fault injection testing using aspectj. The faults are injected into socketchannel read/write, I throw exceptions randomly at a 1/200 ratio (rand.nextFloat() <= .005 => throw IOException

      You can see when a fault is injected in the log via:
      2009-08-19 16:57:09,568 - INFO [Thread-74:ReadRequestFailsIntermittently@38] - READPACKET FORCED FAIL

      vs a read/write that didn't force fail:
      2009-08-19 16:57:09,568 - INFO [Thread-74:ReadRequestFailsIntermittently@41] - READPACKET OK

      otw standard code/config (straight fle quorum with 5 members)

      also see the attached jstack trace. this is for one of the servers. Notice in particular that the number of sendworkers != the number of recv workers.

      1. jst.txt
        10 kB
        Patrick Hunt
      2. log3_debug.tar.gz
        820 kB
        Patrick Hunt
      3. logs.tar.gz
        229 kB
        Patrick Hunt
      4. logs2.tar.gz
        599 kB
        Patrick Hunt
      5. t5_aj.tar.gz
        1.42 MB
        Patrick Hunt
      6. ZOOKEEPER-512.patch
        5 kB
        Flavio Junqueira
      7. ZOOKEEPER-512.patch
        5 kB
        Flavio Junqueira
      8. ZOOKEEPER-512.patch
        2 kB
        Flavio Junqueira
      9. ZOOKEEPER-512.patch
        0.9 kB
        Flavio Junqueira
      10. ZOOKEEPER-512.patch
        3 kB
        Flavio Junqueira

        Activity

          People

          • Assignee:
            Flavio Junqueira
            Reporter:
            Patrick Hunt
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development