Uploaded image for project: 'Kafka'
  1. Kafka
  2. KAFKA-5007

Kafka Replica Fetcher Thread- Resource Leak

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Open
    • Priority: Critical
    • Resolution: Unresolved
    • Affects Version/s: 0.10.0.0, 0.10.1.1, 0.10.2.0
    • Fix Version/s: None
    • Component/s: core, network
    • Labels:
    • Environment:
      Centos 7
      Jave 8

      Description

      Kafka is running out of open file descriptor when system network interface is done.

      Issue description:
      We have a Kafka Cluster of 5 node running on version 0.10.1.1. The open file descriptor for the account running Kafka is set to 100000.

      During an upgrade, network interface went down. Outage continued for 12 hours eventually all the broker crashed with java.io.IOException: Too many open files error.

      We repeated the test in a lower environment and observed that Open Socket count keeps on increasing while the NIC is down.
      We have around 13 topics with max partition size of 120 and number of replica fetcher thread is set to 8.

      Using an internal monitoring tool we observed that Open Socket descriptor for the broker pid continued to increase although NIC was down leading to Open File descriptor error.

        Attachments

        1. lsofzookeeper.txt
          34 kB
          Joseph Aliase
        2. lsofkafka.txt
          4.85 MB
          Joseph Aliase
        3. jstack-zoo.out
          20 kB
          Joseph Aliase
        4. jstack-kafka.out
          119 kB
          Joseph Aliase

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              joseph.aliase07@gmail.com Joseph Aliase
            • Votes:
              2 Vote for this issue
              Watchers:
              17 Start watching this issue

              Dates

              • Created:
                Updated: