[KAFKA-514] Replication with Leader Failure Test: Log segment files checksum mismatch - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Blocker
Resolution: Duplicate
Affects Version/s: 0.8.0
Fix Version/s: 0.8.0
Component/s: None
Labels:
- replication-testing

Description

Test Description:

1. Produce and consume messages to 1 topics and 3 partitions.
2. This test sends 10 messages every 2 sec to 3 replicas.
3. At the end verifies the log size and contents as well as using a consumer to verify that there is no message loss.

The issue:
When the leader is terminated by a controlled failure (kill -15), the resulting log segment files size are not all matching. The mismatch log segment size would happen in one of the partition of the terminated broker. This is consistently reproducible from the system regression test for replication with the following configurations:

zookeeper: 1-node (local)
brokers: 3-node cluster (all local)
replica factor: 3
no. of topic: 1
no. of partition: 2
iterations of leader failure: 1

Remarks:

It is rarely reproducible if the no. of partitions is 1.
Even the file checksums are not matching, the no. of messages in the producer & consumer logs are equal

Test result (shown with log file checksum):

broker-1 :
test_1-0/00000000000000000000.kafka => 1690639555
test_1-1/00000000000000000000.kafka => 4068655384 <<<< not matching across all replicas

broker-2 :
test_1-0/00000000000000000000.kafka => 1690639555
test_1-1/00000000000000000000.kafka => 4068655384 <<<< not matching across all replicas

broker-3 :
test_1-0/00000000000000000000.kafka => 1690639555
test_1-1/00000000000000000000.kafka => 3530842923 <<<< not matching across all replicas

Errors:
The following error is found in the terminated leader:

[2012-09-14 11:07:05,217] WARN No previously checkpointed highwatermark value found for topic test_1 partition 1. Returning 0 as the highwatermark (kafka.server.HighwaterMarkCheckpoint)
[2012-09-14 11:07:05,220] ERROR Replica Manager on Broker 3: Error processing leaderAndISR request LeaderAndIsrRequest(1,,true,1000,Map((test_1,1) ->

{ "ISR": "1,2","leader": "1","leaderEpoch": "0" }

, (test_1,0) ->

{ "ISR": " 1,2","leader": "1","leaderEpoch": "1" }

)) (kafka.server.ReplicaManager)
kafka.common.KafkaException: End index must be segment list size - 1
at kafka.log.SegmentList.truncLast(SegmentList.scala:82)
at kafka.log.Log.truncateTo(Log.scala:471)
at kafka.cluster.Partition.makeFollower(Partition.scala:171)
at kafka.cluster.Partition.makeLeaderOrFollower(Partition.scala:126)
at kafka.server.ReplicaManager.kafka$server$ReplicaManager$$makeFollower(ReplicaManager.scala:195)
at kafka.server.ReplicaManager$$anonfun$becomeLeaderOrFollower$2.apply(ReplicaManager.scala:154)
at kafka.server.ReplicaManager$$anonfun$becomeLeaderOrFollower$2.apply(ReplicaManager.scala:144)
at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:80)
at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:80)
at scala.collection.Iterator$class.foreach(Iterator.scala:631)
at scala.collection.mutable.HashTable$$anon$1.foreach(HashTable.scala:161)
at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:194)
at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:39)
at scala.collection.mutable.HashMap.foreach(HashMap.scala:80)
at kafka.server.ReplicaManager.becomeLeaderOrFollower(ReplicaManager.scala:144)
at kafka.server.KafkaApis.handleLeaderAndISRRequest(KafkaApis.scala:73)
at kafka.server.KafkaApis.handle(KafkaApis.scala:60)
at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:40)
at java.lang.Thread.run(Thread.java:662)

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

testcase_2.tar
04/Oct/12 23:24
10 kB
John Fung
system_test_output_archive.tar.gz
04/Oct/12 23:31
81 kB
John Fung
kafka-514-reproduce-issue.patch
05/Oct/12 22:43
330 kB
John Fung
kafka-514_v2.patch
06/Oct/12 04:29
5 kB
Jun Rao
kafka-514_v1.patch
05/Oct/12 23:05
2 kB
Jun Rao

Activity

People

Assignee:: Unassigned

Reporter:: John Fung

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 14/Sep/12 19:39

Updated:: 04/Dec/12 23:52

Resolved:: 04/Dec/12 23:52