Uploaded image for project: 'Kafka'
  1. Kafka
  2. KAFKA-7757

Too many open files after java.io.IOException: Connection to n was disconnected before the response was read

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 2.1.0
    • None
    • core
    • None

    Description

      We upgraded from 0.10.2.2 to 2.1.0 (a cluster with 3 brokers)

      After a while (hours) 2 brokers start to throw:

      java.io.IOException: Connection to NN was disconnected before the response was read
      at org.apache.kafka.clients.NetworkClientUtils.sendAndReceive(NetworkClientUtils.java:97)
      at kafka.server.ReplicaFetcherBlockingSend.sendRequest(ReplicaFetcherBlockingSend.scala:97)
      at kafka.server.ReplicaFetcherThread.fetchFromLeader(ReplicaFetcherThread.scala:190)
      at kafka.server.AbstractFetcherThread.kafka$server$AbstractFetcherThread$$processFetchRequest(AbstractFetcherThread.scala:241)
      at kafka.server.AbstractFetcherThread$$anonfun$maybeFetch$1.apply(AbstractFetcherThread.scala:130)
      at kafka.server.AbstractFetcherThread$$anonfun$maybeFetch$1.apply(AbstractFetcherThread.scala:129)
      at scala.Option.foreach(Option.scala:257)
      at kafka.server.AbstractFetcherThread.maybeFetch(AbstractFetcherThread.scala:129)
      at kafka.server.AbstractFetcherThread.doWork(AbstractFetcherThread.scala:111)
      at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:82)
      

      File descriptors start to pile up and if I do not restart it throws "Too many open files" and crashes.  

      ERROR Error while accepting connection (kafka.network.Acceptor)
      java.io.IOException: Too many open files in system
      at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method)
      at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:422)
      at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:250)
      at kafka.network.Acceptor.accept(SocketServer.scala:460)
      at kafka.network.Acceptor.run(SocketServer.scala:403)
      at java.lang.Thread.run(Thread.java:748)
      

       

       After some hours the issue happens again... It has happened with all brokers, so it is not something specific to an instance.

       

      Attachments

        1. dump.txt
          165 kB
          arthur
        2. fd-spike-threads.txt
          79 kB
          Jeff Nadler
        3. image-2021-04-29-11-24-22-704.png
          20 kB
          luws
        4. image-2021-04-29-11-25-41-208.png
          17 kB
          luws
        5. image-2021-04-29-11-26-34-894.png
          18 kB
          luws
        6. image-2021-04-29-11-27-12-924.png
          18 kB
          luws
        7. image-2021-04-29-11-27-35-679.png
          24 kB
          luws
        8. kafka-allocated-file-handles.png
          12 kB
          Mathias Kub
        9. Screen Shot 2019-01-03 at 12.20.38 PM.png
          17 kB
          Jeff Nadler
        10. server.properties
          0.7 kB
          Pedro Gontijo
        11. td1.txt
          81 kB
          Pedro Gontijo
        12. td2.txt
          82 kB
          Pedro Gontijo
        13. td3.txt
          82 kB
          Pedro Gontijo

        Issue Links

          Activity

            People

              Unassigned Unassigned
              pedrong Pedro Gontijo
              Votes:
              3 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated: