Uploaded image for project: 'ZooKeeper'
  1. ZooKeeper
  2. ZOOKEEPER-880

QuorumCnxManager$SendWorker grows without bounds

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Blocker
    • Resolution: Fixed
    • 3.4.0
    • 3.4.0
    • None
    • None
    • Reviewed

    Description

      We're seeing an issue where one server in the ensemble has a steady growing number of QuorumCnxManager$SendWorker threads up to a point where the OS runs out of native threads, and at the same time we see a lot of exceptions in the logs. This is on 3.2.2 and our config looks like:

      tickTime=3000
      dataDir=/somewhere_thats_not_tmp
      clientPort=2181
      initLimit=10
      syncLimit=5
      server.0=sv4borg9:2888:3888
      server.1=sv4borg10:2888:3888
      server.2=sv4borg11:2888:3888
      server.3=sv4borg12:2888:3888
      server.4=sv4borg13:2888:3888
      

      The issue is on the first server. I'm going to attach threads dumps and logs in moment.

      Attachments

        1. ZOOKEEPER-trunk-880
          14 kB
          Vishal Kher
        2. ZOOKEEPER-880-3.3.patch
          8 kB
          Vishal Kher
        3. ZOOKEEPER-880.patch
          2 kB
          Vishal Kher
        4. ZOOKEEPER-880.patch
          8 kB
          Vishal Kher
        5. ZOOKEEPER-880.patch
          8 kB
          Vishal Kher
        6. TRACE-hbase-hadoop-zookeeper-sv4borg9.log.gz
          562 kB
          Jean-Daniel Cryans
        7. jstack
          36 kB
          Jean-Daniel Cryans
        8. hbase-hadoop-zookeeper-sv4borg9.log.gz
          4 kB
          Jean-Daniel Cryans
        9. hbase-hadoop-zookeeper-sv4borg12.log.gz
          3 kB
          Jean-Daniel Cryans

        Issue Links

          Activity

            People

              vishalmlst Vishal Kher
              jdcryans Jean-Daniel Cryans
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: