Uploaded image for project: 'Kafka'
  1. Kafka
  2. KAFKA-7327

The CPU memory of the kafak master node continues to soar, does not recycle, and finally the service fails?

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Blocker
    • Resolution: Unresolved
    • 1.1.0
    • None
    • consumer
    • None
    • linux centos7

    Description

      Xmlipcregsvc-> 172.18.58.184:60686 (CLOSE_WAIT) has many such ports to close waiting, which is the application connection side.Why wait?Memory nor recycling three services are 2 nuclear 4 gb of memory, this before is kafka3G, found that memory, the heap memory, and then I will limit kfaka up to 2 g, but the master node to run after a period of time, and submitted to the heap memory and heap memory leak, I free -m looked at it and really have 100 MB of memory, I don't know where memory use, kafka made up 80% of the process of memory, CPU by more than 100%, what reason is this?The configuration parameters have been checked with the official website. The default is not acceptable.

      XmlIpcRegSvc->172.18.58.184:60686 (CLOSE_WAIT) 有很多这个样的端口关闭等待,这是应用连接端。为什么一直等待呢?内存也没有回收 我3台服务是 2核 4G 内存,这之前给的是kafka3G,发现内存没了,报堆外内存溢出,然后我就限制kfaka最大为2G,但主节点跑一段时间后,又报堆内存溢出和堆外内存溢出,我free -m看了一下,内存确实还有100MB了,不知内存用在那里,kafka 这个进程暂用完了 80%的内存,cpu 100%多了,这是什么原因呢?配置参数和官网核对了一下,全用默认的也不行,

      ` 1772 liandong 20 0 6398984 2.146g 16112 S 101.3 58.0 93:59.72 /usr/local/jdk1.8/bin/java -Xmx2G -Xms1G -server -XX:UseG1GC -XX:+HeapDumpOnOutOfMemoryError -XX:MaxGCPauseMillis=20 -XX:InitiatingHeapOccupancyPercent=35 -XX:+ExplicitGCInvokesConcurrent -Djava.awt.headless=true -XX:MaxDirectMemorySize=512m -Xloggc:/data/kafka/bin/../logs/kafkaSer...`

      kafka server.log log
      `[2018-08-23 07:56:11,788] INFO [GroupCoordinator 0]: Stabilized group consumer.web.log generation 268 (__consumer_offsets-24) (kafka.coordinator.group.GroupCoordinator)
      [2018-08-23 07:56:12,054] ERROR Processor got uncaught exception. (kafka.network.Processor)
      java.lang.OutOfMemoryError: Java heap space
      [2018-08-23 07:56:13,846] ERROR Processor got uncaught exception. (kafka.network.Processor)
      java.lang.OutOfMemoryError: Java heap space
      [2018-08-23 07:56:15,673] ERROR Processor got uncaught exception. (kafka.network.Processor)
      java.lang.OutOfMemoryError: Direct buffer memory
      at java.nio.Bits.reserveMemory(Bits.java:694)
      at java.nio.DirectByteBuffer.<init>(DirectByteBuffer.java:123)
      at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:311)
      at sun.nio.ch.Util.getTemporaryDirectBuffer(Util.java:241)
      at sun.nio.ch.IOUtil.read(IOUtil.java:195)
      at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380)
      at org.apache.kafka.common.network.PlaintextTransportLayer.read(PlaintextTransportLayer.java:104)
      at org.apache.kafka.common.network.NetworkReceive.readFromReadableChannel(NetworkReceive.java:145)
      at org.apache.kafka.common.network.NetworkReceive.readFrom(NetworkReceive.java:93)
      at org.apache.kafka.common.network.KafkaChannel.receive(KafkaChannel.java:235)
      at org.apache.kafka.common.network.KafkaChannel.read(KafkaChannel.java:196)
      at org.apache.kafka.common.network.Selector.attemptRead(Selector.java:557)
      at org.apache.kafka.common.network.Selector.pollSelectionKeys(Selector.java:495)
      at org.apache.kafka.common.network.Selector.poll(Selector.java:424)
      at kafka.network.Processor.poll(SocketServer.scala:628)
      at kafka.network.Processor.run(SocketServer.scala:545)
      at java.lang.Thread.run(Thread.java:748)
      [2018-08-23 07:56:16,379] ERROR Processor got uncaught exception. (kafka.network.Processor)
      java.lang.OutOfMemoryError: Java heap space`

      172.18.58.184:speedtrace (CLOSE_WAIT) 172.18.58.184 是 kafka client connect
      lsof -i | grep java
      `java 1772 liandong 83u IPv4 7990697 0t0 TCP *:36145 (LISTEN)
      java 1772 liandong 84u IPv4 7990698 0t0 TCP *:9099 (LISTEN)
      java 1772 liandong 85u IPv4 7990701 0t0 TCP *:40745 (LISTEN)
      java 1772 liandong 100u IPv4 7990709 0t0 TCP prod_data_kafka_2:44688->prod_data_zk:eforward (ESTABLISHED)
      java 1772 liandong 193u IPv4 7989816 0t0 TCP prod_data_kafka_2:XmlIpcRegSvc (LISTEN)
      java 1772 liandong 224u IPv4 8019955 0t0 TCP prod_data_kafka_2:9099->172.18.58.184:47430 (ESTABLISHED)
      java 1772 liandong 228u IPv4 8018733 0t0 TCP prod_data_kafka_2:XmlIpcRegSvc->172.18.58.184:33032 (CLOSE_WAIT)
      java 1772 liandong 229u IPv4 7990859 0t0 TCP prod_data_kafka_2:XmlIpcRegSvc->172.18.58.184:51334 (ESTABLISHED)
      java 1772 liandong 230u IPv4 8022506 0t0 TCP prod_data_kafka_2:36145->172.18.58.184:46112 (ESTABLISHED)
      java 1772 liandong 235u IPv4 7989829 0t0 TCP prod_data_kafka_2:32976->prod_data_kafka_1:XmlIpcRegSvc (ESTABLISHED)
      java 1772 liandong 236u IPv4 8022224 0t0 TCP prod_data_kafka_2:36145->172.18.58.184:46024 (ESTABLISHED)
      java 1772 liandong 243u IPv4 7998548 0t0 TCP prod_data_kafka_2:XmlIpcRegSvc->prod_data_kafka_3:39816 (ESTABLISHED)
      java 1772 liandong 247u IPv4 7998555 0t0 TCP prod_data_kafka_2:33206->prod_data_kafka_3:XmlIpcRegSvc (ESTABLISHED)
      java 1772 liandong 248u IPv4 8017061 0t0 TCP prod_data_kafka_2:XmlIpcRegSvc->172.18.58.184:60686 (CLOSE_WAIT)
      java 1772 liandong 251u IPv4 7999481 0t0 TCP prod_data_kafka_2:XmlIpcRegSvc->prod_data_kafka_1:48914 (ESTABLISHED)
      java 1772 liandong 254u IPv4 8016659 0t0 TCP prod_data_kafka_2:XmlIpcRegSvc->172.18.58.184:60920 (CLOSE_WAIT)
      java 1772 liandong 255u IPv4 8009660 0t0 TCP prod_data_kafka_2:XmlIpcRegSvc->172.18.58.184:59356 (ESTABLISHED)
      java 1772 liandong 256u IPv4 8017062 0t0 TCP prod_data_kafka_2:XmlIpcRegSvc->172.18.58.184:60700 (ESTABLISHED)
      java 1772 liandong 257u IPv4 8022398 0t0 TCP prod_data_kafka_2:XmlIpcRegSvc->172.18.58.184:33626 (ESTABLISHED)
      java 1772 liandong 259u IPv4 8019887 0t0 TCP prod_data_kafka_2:XmlIpcRegSvc->172.18.58.184:speedtrace (CLOSE_WAIT)
      `
      gc log
      `2018-08-22T19:01:45.014+0800: 31537.291: [GC pause (G1 Evacuation Pause) (young) (initial-mark), 0.0147456 secs]
      [Parallel Time: 12.9 ms, GC Workers: 2]
      [GC Worker Start (ms): Min: 31537291.3, Avg: 31537291.3, Max: 31537291.3, Diff: 0.0]
      [Ext Root Scanning (ms): Min: 1.8, Avg: 1.9, Max: 1.9, Diff: 0.1, Sum: 3.8]
      [Update RS (ms): Min: 1.9, Avg: 1.9, Max: 1.9, Diff: 0.0, Sum: 3.9]
      [Processed Buffers: Min: 14, Avg: 15.5, Max: 17, Diff: 3, Sum: 31]
      [Scan RS (ms): Min: 4.5, Avg: 4.5, Max: 4.5, Diff: 0.0, Sum: 9.0]
      [Code Root Scanning (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.1]
      [Object Copy (ms): Min: 4.1, Avg: 4.2, Max: 4.2, Diff: 0.1, Sum: 8.3]
      [Termination (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]
      [Termination Attempts: Min: 4, Avg: 4.0, Max: 4, Diff: 0, Sum: 8]
      [GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.1]
      [GC Worker Total (ms): Min: 12.5, Avg: 12.5, Max: 12.5, Diff: 0.0, Sum: 25.1]
      [GC Worker End (ms): Min: 31537303.8, Avg: 31537303.8, Max: 31537303.8, Diff: 0.0]
      [Code Root Fixup: 0.1 ms]
      [Code Root Purge: 0.0 ms]
      [Clear CT: 0.3 ms]
      [Other: 1.4 ms]
      [Choose CSet: 0.0 ms]
      [Ref Proc: 0.2 ms]
      [Ref Enq: 0.0 ms]
      [Redirty Cards: 0.1 ms]
      [Humongous Register: 0.0 ms]
      [Humongous Reclaim: 0.1 ms]
      [Free CSet: 0.6 ms]
      [Eden: 781.0M(781.0M)->0.0B(781.0M) Survivors: 3072.0K->3072.0K Heap: 2106.0M(2347.0M)->1325.2M(2347.0M)]
      [Times: user=0.03 sys=0.00, real=0.02 secs]
      2018-08-22T19:01:45.029+0800: 31537.306: [GC concurrent-root-region-scan-start]
      2018-08-22T19:01:45.039+0800: 31537.315: [GC concurrent-root-region-scan-end, 0.0098860 secs]
      2018-08-22T19:01:45.039+0800: 31537.315: [GC concurrent-mark-start]
      2018-08-22T19:01:45.111+0800: 31537.388: [GC concurrent-mark-end, 0.0721221 secs]
      2018-08-22T19:01:45.111+0800: 31537.388: [GC remark 2018-08-22T19:01:45.111+0800: 31537.388: [Finalize Marking, 0.0002506 secs] 2018-08-22T19:01:45.111+0800: 31537.388: [GC ref-proc, 0.0008536 secs] 2018-08-22T19:01:45.112+0800: 31537.389: [Unloading, 0.0159521 secs], 0.0264459 secs]
      [Times: user=0.05 sys=0.00, real=0.03 secs]
      2018-08-22T19:01:45.139+0800: 31537.415: [GC cleanup 1339M->1339M(2347M), 0.0026152 secs]
      [Times: user=0.00 sys=0.00, real=0.00 secs]
      2018-08-22T19:01:48.222+0800: 31540.499: [GC pause (G1 Evacuation Pause) (young), 0.0141944 secs]
      [Parallel Time: 12.6 ms, GC Workers: 2]
      [GC Worker Start (ms): Min: 31540499.4, Avg: 31540499.4, Max: 31540499.4, Diff: 0.0]
      [Ext Root Scanning (ms): Min: 1.4, Avg: 1.4, Max: 1.4, Diff: 0.1, Sum: 2.8]
      [Update RS (ms): Min: 2.3, Avg: 2.3, Max: 2.3, Diff: 0.1, Sum: 4.6]
      [Processed Buffers: Min: 11, Avg: 17.0, Max: 23, Diff: 12, Sum: 34]
      [Scan RS (ms): Min: 4.4, Avg: 4.5, Max: 4.5, Diff: 0.1, Sum: 8.9]
      [Code Root Scanning (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.1]
      [Object Copy (ms): Min: 4.2, Avg: 4.3, Max: 4.3, Diff: 0.1, Sum: 8.5]
      [Termination (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]
      [Termination Attempts: Min: 1, Avg: 2.5, Max: 4, Diff: 3, Sum: 5]
      [GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]
      [GC Worker Total (ms): Min: 12.5, Avg: 12.5, Max: 12.5, Diff: 0.0, Sum: 24.9]
      [GC Worker End (ms): Min: 31540511.9, Avg: 31540511.9, Max: 31540511.9, Diff: 0.0]
      [Code Root Fixup: 0.1 ms]
      [Code Root Purge: 0.0 ms]
      [Clear CT: 0.3 ms]
      [Other: 1.3 ms]
      [Choose CSet: 0.0 ms]
      [Ref Proc: 0.2 ms]
      [Ref Enq: 0.0 ms]
      [Redirty Cards: 0.1 ms]
      [Humongous Register: 0.1 ms]
      [Humongous Reclaim: 0.1 ms]
      [Free CSet: 0.6 ms]
      [Eden: 781.0M(781.0M)->0.0B(780.0M) Survivors: 3072.0K->3072.0K Heap: 2106.2M(2347.0M)->1325.2M(2347.0M)]
      [Times: user=0.02 sys=0.00, real=0.01 secs]
      2018-08-22T19:01:51.373+0800: 31543.650: [GC pause (G1 Evacuation Pause) (young), 0.0146431 secs]
      [Parallel Time: 13.1 ms, GC Workers: 2]
      [GC Worker Start (ms): Min: 31543649.9, Avg: 31543649.9, Max: 31543649.9, Diff: 0.0]
      [Ext Root Scanning (ms): Min: 1.4, Avg: 1.4, Max: 1.5, Diff: 0.1, Sum: 2.8]
      [Update RS (ms): Min: 2.4, Avg: 2.4, Max: 2.5, Diff: 0.1, Sum: 4.8]
      [Processed Buffers: Min: 8, Avg: 17.5, Max: 27, Diff: 19, Sum: 35]
      [Scan RS (ms): Min: 4.5, Avg: 4.6, Max: 4.7, Diff: 0.2, Sum: 9.2]
      [Code Root Scanning (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.1]
      [Object Copy (ms): Min: 4.4, Avg: 4.4, Max: 4.5, Diff: 0.1, Sum: 8.9]
      [Termination (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]
      [Termination Attempts: Min: 1, Avg: 1.0, Max: 1, Diff: 0, Sum: 2]
      [GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.1]
      [GC Worker Total (ms): Min: 13.0, Avg: 13.0, Max: 13.0, Diff: 0.0, Sum: 25.9]
      [GC Worker End (ms): Min: 31543662.8, Avg: 31543662.8, Max: 31543662.9, Diff: 0.0]
      [Code Root Fixup: 0.1 ms]
      [Code Root Purge: 0.0 ms]
      [Clear CT: 0.4 ms]
      [Other: 1.2 ms]
      [Choose CSet: 0.0 ms]
      [Ref Proc: 0.1 ms]
      [Ref Enq: 0.0 ms]
      [Redirty Cards: 0.1 ms]
      [Humongous Register: 0.1 ms]
      [Humongous Reclaim: 0.1 ms]
      [Free CSet: 0.6 ms]
      [Eden: 780.0M(780.0M)->0.0B(780.0M) Survivors: 3072.0K->3072.0K Heap: 2105.2M(2347.0M)->1325.3M(2347.0M)]
      [Times: user=0.02 sys=0.00, real=0.02 secs] `

      Attachments

        1. server.log.2018-08-23-10
          177 kB
          ruiliang

        Activity

          People

            Unassigned Unassigned
            ruilaing ruiliang
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:

              Time Tracking

                Estimated:
                Original Estimate - 12h
                12h
                Remaining:
                Remaining Estimate - 12h
                12h
                Logged:
                Time Spent - Not Specified
                Not Specified