[ZOOKEEPER-2899] Zookeeper not receiving packets after ZXID overflows - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: 3.4.5
Fix Version/s: None
Component/s: leaderElection
Labels:
None
Environment:

5 host ensemble, 1500+ client connections each, 300K+ nodes
OS: Ubuntu precise
JAVA 7
JuniperQFX510048T NIC, 10000Mb/s, ixgbe driver
6 core Intel(R)Xeon(R)_CPU_E5-2620_v3@_2.40GHz
4 HDD 600G each

Description

ZK was used with Kafka (version 0.10.0) for coordination. We had a lot of Kafka consumers writing consumption offsets to ZK.

We observed the issue two times within the last year. Each time after ZXID overflowed, ZK was not receiving packets even though leader election looked successful from the logs, and ZK servers were up. As a result, the whole Kafka system came to a halt.

As an attempt to reproduce (and hopefully fixing) the issue, I set up test ZK and Kafka clusters and feed them with like-production test traffic. Though not really able to reproduce the issue, I did see that the Kafka consumers, which used ZK clients, essentially DOSed the ensemble, filling up the `submittedRequests` in `PrepRequestProcessor`, causing even 100ms+ read latencies.

More details are included in the comments.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

GC_metric.png
15/Sep/17 17:40
61 kB
Yicheng Fang
image12.png
15/Sep/17 16:58
182 kB
Yicheng Fang
image13.png
15/Sep/17 16:58
141 kB
Yicheng Fang
message_in_per_sec.png
15/Sep/17 17:40
415 kB
Yicheng Fang
metric_volume.png
15/Sep/17 17:44
229 kB
Yicheng Fang
zk_20170309_wo_noise.log
15/Sep/17 17:07
48 kB
Yicheng Fang

Activity

People

Assignee:: Unassigned

Reporter:: Yicheng Fang

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 15/Sep/17 01:00

Updated:: 05/Oct/17 18:18