Uploaded image for project: 'Kafka'
  1. Kafka
  2. KAFKA-3900

High CPU util on broker

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 0.10.0.0
    • None
    • network, replication
    • kafka = 2.11-0.10.0.0
      java version "1.8.0_91"
      amazon linux

    Description

      I start kafka cluster in amazon with m4.xlarge (4 cpu and 16 GB mem (14 allocate for kafka in heap)). Have three nodes.

      I haven't high load (6000 message/sec) and we have cpu_idle = 70%, but sometime (about once a day) I see this message in server.log:

      [2016-06-24 14:52:22,299] WARN [ReplicaFetcherThread-0-2], Error in fetch kafka.server.ReplicaFetcherThread$FetchRequest@6eaa1034 (kafka.server.ReplicaFetcherThread)
      java.io.IOException: Connection to 2 was disconnected before the response was read
      at kafka.utils.NetworkClientBlockingOps$$anonfun$blockingSendAndReceive$extension$1$$anonfun$apply$1.apply(NetworkClientBlockingOps.scala:87)
      at kafka.utils.NetworkClientBlockingOps$$anonfun$blockingSendAndReceive$extension$1$$anonfun$apply$1.apply(NetworkClientBlockingOps.scala:84)
      at scala.Option.foreach(Option.scala:257)
      at kafka.utils.NetworkClientBlockingOps$$anonfun$blockingSendAndReceive$extension$1.apply(NetworkClientBlockingOps.scala:84)
      at kafka.utils.NetworkClientBlockingOps$$anonfun$blockingSendAndReceive$extension$1.apply(NetworkClientBlockingOps.scala:80)
      at kafka.utils.NetworkClientBlockingOps$.recursivePoll$2(NetworkClientBlockingOps.scala:137)
      at kafka.utils.NetworkClientBlockingOps$.kafka$utils$NetworkClientBlockingOps$$pollContinuously$extension(NetworkClientBlockingOps.scala:143)
      at kafka.utils.NetworkClientBlockingOps$.blockingSendAndReceive$extension(NetworkClientBlockingOps.scala:80)
      at kafka.server.ReplicaFetcherThread.sendRequest(ReplicaFetcherThread.scala:244)
      at kafka.server.ReplicaFetcherThread.fetch(ReplicaFetcherThread.scala:229)
      at kafka.server.ReplicaFetcherThread.fetch(ReplicaFetcherThread.scala:42)
      at kafka.server.AbstractFetcherThread.processFetchRequest(AbstractFetcherThread.scala:107)
      at kafka.server.AbstractFetcherThread.doWork(AbstractFetcherThread.scala:98)
      at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:63)

      I know, this can be network glitch, but why kafka eat all cpu time?

      My config:

      inter.broker.protocol.version=0.10.0.0
      log.message.format.version=0.10.0.0

      default.replication.factor=3
      num.partitions=3

      replica.lag.time.max.ms=15000

      broker.id=0
      listeners=PLAINTEXT://:9092
      log.dirs=/mnt/kafka/kafka
      log.retention.check.interval.ms=300000
      log.retention.hours=168
      log.segment.bytes=1073741824
      num.io.threads=20
      num.network.threads=10
      num.partitions=1
      num.recovery.threads.per.data.dir=2
      socket.receive.buffer.bytes=102400
      socket.request.max.bytes=104857600
      socket.send.buffer.bytes=102400
      zookeeper.connection.timeout.ms=6000
      delete.topic.enable = true
      broker.max_heap_size=10 GiB

      Any ideas?

      Attachments

        Activity

          People

            Unassigned Unassigned
            akonyaev Andrey Konyaev
            Votes:
            1 Vote for this issue
            Watchers:
            9 Start watching this issue

            Dates

              Created:
              Updated: