Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-11935

ZooKeeper connection storm after queue failover with slave cluster down

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Critical
    • Resolution: Implemented
    • 0.99.0, 0.94.23, 0.98.6, 2.0.0
    • None
    • None
    • None

    Description

      We just ran into a production incident with TCP SYN storms on port 2181 (zookeeper).

      In our case the slave cluster was not running. When we bounced the primary cluster we saw an "unbounded" number of failover threads all hammering the hosts on the slave ZK machines (which did not run ZK at the time)... Causing overall degradation of network performance between datacenters.

      Looking at the code we noticed that the thread pool handling of the Failover workers was probably unintended.

      Patch coming soon.

      Attachments

        Activity

          People

            Unassigned Unassigned
            larsh Lars Hofhansl
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: