Uploaded image for project: 'ZooKeeper'
  1. ZooKeeper
  2. ZOOKEEPER-2899

Zookeeper not receiving packets after ZXID overflows

Add voteVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 3.4.5
    • None
    • leaderElection
    • None
    • 5 host ensemble, 1500+ client connections each, 300K+ nodes
      OS: Ubuntu precise
      JAVA 7
      JuniperQFX510048T NIC, 10000Mb/s, ixgbe driver
      6 core Intel(R)Xeon(R)_CPU_E5-2620_v3@_2.40GHz
      4 HDD 600G each

    Description

      ZK was used with Kafka (version 0.10.0) for coordination. We had a lot of Kafka consumers writing consumption offsets to ZK.

      We observed the issue two times within the last year. Each time after ZXID overflowed, ZK was not receiving packets even though leader election looked successful from the logs, and ZK servers were up. As a result, the whole Kafka system came to a halt.

      As an attempt to reproduce (and hopefully fixing) the issue, I set up test ZK and Kafka clusters and feed them with like-production test traffic. Though not really able to reproduce the issue, I did see that the Kafka consumers, which used ZK clients, essentially DOSed the ensemble, filling up the `submittedRequests` in `PrepRequestProcessor`, causing even 100ms+ read latencies.

      More details are included in the comments.

      Attachments

        1. metric_volume.png
          229 kB
          Yicheng Fang
        2. message_in_per_sec.png
          415 kB
          Yicheng Fang
        3. GC_metric.png
          61 kB
          Yicheng Fang
        4. zk_20170309_wo_noise.log
          48 kB
          Yicheng Fang
        5. image12.png
          182 kB
          Yicheng Fang
        6. image13.png
          141 kB
          Yicheng Fang

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            Unassigned Unassigned
            eefangyicheng Yicheng Fang

            Dates

              Created:
              Updated:

              Slack

                Issue deployment