[ZOOKEEPER-1489] Data loss after truncate on transaction log - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Blocker
Resolution: Fixed
Affects Version/s: 3.4.3, 3.3.5
Fix Version/s: 3.3.6, 3.4.4, 3.5.0
Component/s: server
Labels:
None
Environment:

Tested on Ubuntu 12.04 and CentOS 6, should be reproducible elsewhere

Hadoop Flags:

Reviewed

Description

The truncate method on the transaction log in the class org.apache.zookeeper.server.persistence.FileTxnLog will reduce the file size to the required amount without either closing or re-positioning the logStream (which could also be dangerous since the truncate method is not synchronized against concurrent writes to the log).

This causes the next append to that log to create a small "hole" in the file which java would interpret as binary zeroes when reading it. This then causes to the FileTxnIterator.next() implementation to detect the end of the log file too early.

I'll attach a small maven project with one junit test which can be used to reproduce the issue. Due to the blackbox nature of the test it will run for roughly 50 seconds unfortunately.

Steps to reproduce:

Start an ensemble of zookeeper servers with at least 3 participants
Create one entry and the remove one of the servers from the ensemble temporarily (e.g. zk-2)
Create another entry which is hence only reflected on zk-1 and zk-3
Take zk-1 out of the ensemble without shutting it down (that is important, I did that by interrupting the network connection to that node) and clean zk-3
Bring back zk-2 and zk-3 so that they form a quorum
Allow zk-1 to connect again
zk-1 will receive a TRUNC message from zk-2 since zk-1 is now a minority knowing about that second node creation event
Create a third node
Force zk-1 to become master somehow
That third node will be gone

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

ZOOKEEPER-1489.patch
04/Jul/12 00:26
31 kB
Patrick D. Hunt
ZOOKEEPER-1489.patch
11/Jul/12 18:07
32 kB
Patrick D. Hunt
ZOOKEEPER-1489.patch
11/Jul/12 22:20
32 kB
Patrick D. Hunt
ZOOKEEPER-1489.patch
12/Jul/12 06:22
34 kB
Patrick D. Hunt
ZOOKEEPER-1489_br34.patch
04/Jul/12 00:26
31 kB
Patrick D. Hunt
ZOOKEEPER-1489_br34.patch
11/Jul/12 18:07
31 kB
Patrick D. Hunt
ZOOKEEPER-1489_br34.patch
11/Jul/12 22:19
32 kB
Patrick D. Hunt
ZOOKEEPER-1489_br34.patch
12/Jul/12 06:18
33 kB
Patrick D. Hunt
ZOOKEEPER-1489_br33.patch
04/Jul/12 00:26
29 kB
Patrick D. Hunt
ZOOKEEPER-1489_br33.patch
11/Jul/12 18:07
30 kB
Patrick D. Hunt
ZOOKEEPER-1489_br33.patch
11/Jul/12 22:19
30 kB
Patrick D. Hunt
ZOOKEEPER-1489_br33.patch
12/Jul/12 06:18
31 kB
Patrick D. Hunt
TruncateTxLogCorruption.tgz
18/Jun/12 09:14
7 kB
Christian Ziech
TruncateTxLogCorruption.tgz
29/Jun/12 15:51
7 kB
Christian Ziech

Activity

People

Assignee:: Patrick D. Hunt

Reporter:: Christian Ziech

Votes:: 0 Vote for this issue

Watchers:: 8 Start watching this issue

Dates

Created:: 18/Jun/12 09:09

Updated:: 18/Jul/12 11:01

Resolved:: 17/Jul/12 21:29