[ZOOKEEPER-2953] Flaky Test: testNoLogBeforeLeaderEstablishment - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 3.5.3, 3.4.11, 3.6.0
Fix Version/s: 3.5.4, 3.6.0, 3.4.12
Component/s: None
Labels:
None

Hadoop Flags:

Reviewed

Description

testNoLogBeforeLeaderEstablishment has been flaky on 3.4, 3.5, and master for quite awhile. My understanding is that the purpose of the test is to make sure that a server receives support from the quorum before changing the epoch and acting as leader.

There are a couple issues with the test in its current state. First, the assertions the test makes are not always true. It is possible, if the zookeeper database is not cleared, for a follower to be ahead of a leader when the quorum is shutdown. That follower will then likely become leader when the quorum is restarted. This is the cause of the flaky behavior. Second, the test does not appear to create the conditions it wants to test for. Since, ~~ZOOKEEPER-335~~ (specifically the ~~ZOOKEEPER-1081~~ subtask) we take the epoch into consideration in FastLeaderElection so the test no longer "believes it is the leader once it recovers".

After discussing the issue offline with phunt we decided it would still be valuable to test the situation where a server is elected leader without the support of the quorum. So I removed testNoLogBeforeLeaderEstablishment and created a new test called testElectionFraud.

Attachments

Issue Links

links to

GitHub Pull Request #432

GitHub Pull Request #433

GitHub Pull Request #434

Activity

People

Assignee:: Abraham Fine

Reporter:: Abraham Fine

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 13/Dec/17 00:24

Updated:: 16/Dec/17 02:05

Resolved:: 16/Dec/17 00:49