Uploaded image for project: 'Kafka'
  1. Kafka
  2. KAFKA-7757

Too many open files after java.io.IOException: Connection to n was disconnected before the response was read

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 2.1.0
    • Fix Version/s: None
    • Component/s: core
    • Labels:
      None

      Description

      We upgraded from 0.10.2.2 to 2.1.0 (a cluster with 3 brokers)

      After a while (hours) 2 brokers start to throw:

      java.io.IOException: Connection to NN was disconnected before the response was read
      at org.apache.kafka.clients.NetworkClientUtils.sendAndReceive(NetworkClientUtils.java:97)
      at kafka.server.ReplicaFetcherBlockingSend.sendRequest(ReplicaFetcherBlockingSend.scala:97)
      at kafka.server.ReplicaFetcherThread.fetchFromLeader(ReplicaFetcherThread.scala:190)
      at kafka.server.AbstractFetcherThread.kafka$server$AbstractFetcherThread$$processFetchRequest(AbstractFetcherThread.scala:241)
      at kafka.server.AbstractFetcherThread$$anonfun$maybeFetch$1.apply(AbstractFetcherThread.scala:130)
      at kafka.server.AbstractFetcherThread$$anonfun$maybeFetch$1.apply(AbstractFetcherThread.scala:129)
      at scala.Option.foreach(Option.scala:257)
      at kafka.server.AbstractFetcherThread.maybeFetch(AbstractFetcherThread.scala:129)
      at kafka.server.AbstractFetcherThread.doWork(AbstractFetcherThread.scala:111)
      at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:82)
      

      File descriptors start to pile up and if I do not restart it throws "Too many open files" and crashes.  

      ERROR Error while accepting connection (kafka.network.Acceptor)
      java.io.IOException: Too many open files in system
      at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method)
      at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:422)
      at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:250)
      at kafka.network.Acceptor.accept(SocketServer.scala:460)
      at kafka.network.Acceptor.run(SocketServer.scala:403)
      at java.lang.Thread.run(Thread.java:748)
      

       

       After some hours the issue happens again... It has happened with all brokers, so it is not something specific to an instance.

       

        Attachments

        1. server.properties
          0.7 kB
          Pedro Gontijo
        2. td1.txt
          81 kB
          Pedro Gontijo
        3. td3.txt
          82 kB
          Pedro Gontijo
        4. td2.txt
          82 kB
          Pedro Gontijo
        5. kafka-allocated-file-handles.png
          12 kB
          Mathias Kub
        6. Screen Shot 2019-01-03 at 12.20.38 PM.png
          17 kB
          Jeff Nadler
        7. fd-spike-threads.txt
          79 kB
          Jeff Nadler
        8. dump.txt
          165 kB
          arthur

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                pedrong Pedro Gontijo
              • Votes:
                3 Vote for this issue
                Watchers:
                8 Start watching this issue

                Dates

                • Created:
                  Updated: