Uploaded image for project: 'Kafka'
  1. Kafka
  2. KAFKA-7148

Kafka load log very slow after goes down with outOfMemoryError

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Blocker
    • Resolution: Unresolved
    • 0.10.1.1
    • None
    • core
    • None

    Description

      two questions

      1、is there any idea to avoid the IOException: Map failed, or is there any limit in vm mem size ?

      2、is it normal when kafka load one partition log file  cost 20+ secend? 

       

      Detail Info:

      1、Linux Version :

      kafka_2.11-0.10.1.1> cat /etc/SuSE-release
      SUSE Linux Enterprise Server 11 (x86_64)
      VERSION = 11
      PATCHLEVEL = 3

       

      2、VM INFO :4C32G

       

      3、java -version
      java version "1.8.0_131"
      Java(TM) SE Runtime Environment (build 1.8.0_131-b11)
      Java HotSpot(TM) 64-Bit Server VM (build 25.131-b11, mixed mode)

       

       

      4、Start command :

      java -Xmx16G -Xms16G -server -XX:+UseG1GC -XX:MaxGCPauseMillis=20 -XX:InitiatingHeapOccupancyPercent=35 -XX:+DisableExplicitGC -Djava.awt.headless=true -Xloggc:/opt/**/kafka_2.11-0.10.1.1/bin/../logs/kafkaServer-gc.log -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintGCTimeStamps -Dkafka.logs.dir=/opt/**/kafka_2.11-0.10.1.1/bin/../logs -Dlog4j.configuration=file:./../config/log4j.properties -cp :………………………………

       

       

      5、Broker&Topics

      we have 3 broker,3 zookeeper
      we have 4 topics in this kafka cluster
      __consumer_offsets 50 partition,3 replicate
      topic1 5 partition,3 replicate
      topic2 160 partition,3 replicate
      topic3 160 partition,3 replicate

      Total data disk use :32G
      du -sh data/
      32G data/

       

       

      6、logs

      [2018-07-10 17:23:59,728] FATAL [Replica Manager on Broker 1]: Halting due to unrecoverable I/O error while handling produce request: (kafka.server.ReplicaManager)
      kafka.common.KafkaStorageException: I/O exception in append to log '**************-13'
      at kafka.log.Log.append(Log.scala:349)
      at kafka.cluster.Partition$$anonfun$10.apply(Partition.scala:443)
      at kafka.cluster.Partition$$anonfun$10.apply(Partition.scala:429)
      at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:234)
      at kafka.utils.CoreUtils$.inReadLock(CoreUtils.scala:240)
      at kafka.cluster.Partition.appendMessagesToLeader(Partition.scala:429)
      at kafka.server.ReplicaManager$$anonfun$appendToLocalLog$2.apply(ReplicaManager.scala:407)
      at kafka.server.ReplicaManager$$anonfun$appendToLocalLog$2.apply(ReplicaManager.scala:393)
      at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
      at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
      at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99)
      at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99)
      at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:230)
      at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:40)
      at scala.collection.mutable.HashMap.foreach(HashMap.scala:99)
      at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
      at scala.collection.AbstractTraversable.map(Traversable.scala:104)
      at kafka.server.ReplicaManager.appendToLocalLog(ReplicaManager.scala:393)
      at kafka.server.ReplicaManager.appendMessages(ReplicaManager.scala:330)
      at kafka.server.KafkaApis.handleProducerRequest(KafkaApis.scala:436)
      at kafka.server.KafkaApis.handle(KafkaApis.scala:78)
      at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:60)
      at java.lang.Thread.run(Thread.java:748)
      Caused by: java.io.IOException: Map failed
      at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:940)
      at kafka.log.AbstractIndex.<init>(AbstractIndex.scala:61)
      at kafka.log.OffsetIndex.<init>(OffsetIndex.scala:52)
      at kafka.log.LogSegment.<init>(LogSegment.scala:67)
      at kafka.log.Log.roll(Log.scala:778)
      at kafka.log.Log.maybeRoll(Log.scala:744)
      at kafka.log.Log.append(Log.scala:405)
      ... 22 more
      Caused by: java.lang.OutOfMemoryError: Map failed
      at sun.nio.ch.FileChannelImpl.map0(Native Method)
      at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:937)
      ... 28 more

       

       

      7、then I fllow this (https://issues.apache.org/jira/browse/KAFKA-6165?jql=project%20%3D%20KAFKA%20AND%20text%20~%20%22sun.nio.ch.FileChannelImpl.map%20Map%20failed%22

      add the vm.max_map_count from default value (65536)->262144

      /sbin/sysctl -a |grep map
      vm.max_map_count = 262144

      cat /proc/2860/maps |wc -l
      1195

      and change the kafka-run-class.sh replaced `-XX:+DisableExplicitGC` with `-XX:+ExplicitGCInvokesConcurrent`

       

       

      8、but the "IOException: Map failed" problem still exist

      then we add vm mem server to 4C64G,and change the -Xmx16G -Xms16G to -Xmx4G -Xms4G

       

       

      9、load log slow log

      [2018-07-10 17:35:33,481] INFO Completed load of log ***-37 with 2 log segments and log end offset 2441365 in 15678 ms (kafka.log.Log)
      [2018-07-10 17:35:33,484] WARN Found a corrupted index file due to requirement failed: Corrupt index found, index file (/opt/**/data/*-34/00000000000002451611.index) has non-zero size but the last offset is 2451611 which is no larger than the base offset 2451611.}. deleting /opt/*/data/*-34/00000000000002451611.timeindex, /opt/*/data/**-34/00000000000002451611.index and rebuilding index... (kafka.log.Log)
      [2018-07-10 17:35:33,485] INFO Recovering unflushed segment 1086746 in log ***-34. (kafka.log.Log)
      [2018-07-10 17:35:49,070] INFO Recovering unflushed segment 2451611 in log ***-34. (kafka.log.Log)
      [2018-07-10 17:35:49,071] INFO Completed load of log ***-34 with 2 log segments and log end offset 2451613 in 15589 ms (kafka.log.Log)
      [2018-07-10 17:35:49,073] WARN Found a corrupted index file due to requirement failed: Corrupt index found, index file (/opt/**/data/*-31/00000000000002437998.index) has non-zero size but the last offset is 2437998 which is no larger than the base offset 2437998.}. deleting /opt/*/data/*-31/00000000000002437998.timeindex, /opt/*/data/**-31/00000000000002437998.index and rebuilding index... (kafka.log.Log)
      [2018-07-10 17:35:49,090] INFO Recovering unflushed segment 1080802 in log ***-31. (kafka.log.Log)
      [2018-07-10 17:36:04,806] INFO Recovering unflushed segment 2437998 in log ***-31. (kafka.log.Log)
      [2018-07-10 17:36:04,807] INFO Completed load of log ***-31 with 2 log segments and log end offset 2438000 in 15735 ms (kafka.log.Log)
      [2018-07-10 17:36:04,810] WARN Found a corrupted index file due to requirement failed: Corrupt index found, index file (/opt/**/data/*-28/00000000000002432077.index) has non-zero size but the last offset is 2432077 which is no larger than the base offset 2432077.}. deleting /opt/*/data/*-28/00000000000002432077.timeindex, /opt/*/data/**-28/00000000000002432077.index and rebuilding index... (kafka.log.Log)
      [2018-07-10 17:36:04,811] INFO Recovering unflushed segment 1081522 in log ***-28. (kafka.log.Log)
      [2018-07-10 17:36:19,537] INFO Recovering unflushed segment 2432077 in log ***-28. (kafka.log.Log)
      [2018-07-10 17:36:19,538] INFO Completed load of log ***-28 with 2 log segments and log end offset 2432079 in 14730 ms (kafka.log.Log)
      [2018-07-10 17:36:19,539] WARN Found a corrupted index file due to requirement failed: Corrupt index found, index file (/opt/**/data/*-25/00000000000001080046.index) has non-zero size but the last offset is 1080046 which is no larger than the base offset 1080046.}. deleting /opt/*/data/*-25/00000000000001080046.timeindex, /opt/*/data/**-25/00000000000001080046.index and rebuilding index... (kafka.log.Log)
      [2018-07-10 17:36:34,232] INFO Recovering unflushed segment 1080046 in log ***-25. (kafka.log.Log)
      [2018-07-10 17:36:48,261] INFO Completed load of log ***-25 with 1 log segments and log end offset 2435166 in 28721 ms (kafka.log.Log)
      [2018-07-10 17:36:48,266] WARN Found a corrupted index file due to requirement failed: Corrupt index found, index file (/opt/**/data/*-22/00000000000002438178.index) has non-zero size but the last offset is 2438178 which is no larger than the base offset 2438178.}. deleting /opt/*/data/*-22/00000000000002438178.timeindex, /opt/*/data/**-22/00000000000002438178.index and rebuilding index... (kafka.log.Log)
      [2018-07-10 17:36:48,267] INFO Recovering unflushed segment 1081500 in log ***-22. (kafka.log.Log)
      [2018-07-10 17:37:02,803] INFO Recovering unflushed segment 2438178 in log ***-22. (kafka.log.Log)
      [2018-07-10 17:37:02,803] INFO Completed load of log ***-22 with 2 log segments and log end offset 2438180 in 14540 ms (kafka.log.Log)
      [2018-07-10 17:37:02,807] WARN Found a corrupted index file due to requirement failed: Corrupt index found, index file (/opt/**/data/*-19/00000000000001081412.index) has non-zero size but the last offset is 1081412 which is no larger than the base offset 1081412.}. deleting /opt/*/data/*-19/00000000000001081412.timeindex, /opt/*/data/**-19/00000000000001081412.index and rebuilding index... (kafka.log.Log)
      [2018-07-10 17:37:17,536] INFO Recovering unflushed segment 1081412 in log ***-19. (kafka.log.Log)
      [2018-07-10 17:37:31,570] INFO Completed load of log ***-19 with 1 log segments and log end offset 2443668 in 28764 ms (kafka.log.Log)
      [2018-07-10 17:37:31,573] WARN Found a corrupted index file due to requirement failed: Corrupt index found, index file (/opt/**/data/*-35/00000000000002440999.index) has non-zero size but the last offset is 2440999 which is no larger than the base offset 2440999.}. deleting /opt/*/data/*-35/00000000000002440999.timeindex, /opt/*/data/**-35/00000000000002440999.index and rebuilding index... (kafka.log.Log)

       

       

      10、then I check the cpu ,it's 100% use while load logs ,and use top -H -p and jstack find the busy thread

      "pool-3-thread-1" #20 prio=5 os_prio=0 tid=0x00007f7041537000 nid=0x5c3a runnable [0x00007f703042e000]

      java.lang.Thread.State: RUNNABLE

      at sun.nio.ch.FileDispatcherImpl.pread0(Native Method)

      at sun.nio.ch.FileDispatcherImpl.pread(FileDispatcherImpl.java:52)

      at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:220)

      at sun.nio.ch.IOUtil.read(IOUtil.java:197)

      at sun.nio.ch.FileChannelImpl.readInternal(FileChannelImpl.java:741)

      at sun.nio.ch.FileChannelImpl.read(FileChannelImpl.java:727)

      at kafka.log.FileMessageSet$$anon$1.makeNext(FileMessageSet.scala:319)

      at kafka.log.FileMessageSet$$anon$1.makeNext(FileMessageSet.scala:308)

      at kafka.utils.IteratorTemplate.maybeComputeNext(IteratorTemplate.scala:64)

      at kafka.utils.IteratorTemplate.hasNext(IteratorTemplate.scala:56)

      at kafka.log.LogSegment.recover(LogSegment.scala:211)

      at kafka.log.Log$$anonfun$loadSegments$4.apply(Log.scala:218)

      at kafka.log.Log$$anonfun$loadSegments$4.apply(Log.scala:179)

      at scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:733)

      at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)

      at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)

      at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:732)

      at kafka.log.Log.loadSegments(Log.scala:179)

      at kafka.log.Log.<init>(Log.scala:108)

      at kafka.log.LogManager$$anonfun$loadLogs$2$$anonfun$3$$anonfun$apply$10$$anonfun$apply$1.apply$mcV$sp(LogManager.scala:151)

      at kafka.utils.CoreUtils$$anon$1.run(CoreUtils.scala:58)

      at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)

      at java.util.concurrent.FutureTask.run(FutureTask.java:266)

      at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)

      at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)

      at java.lang.Thread.run(Thread.java:748)

      Attachments

        Activity

          People

            Unassigned Unassigned
            xiaotong wang
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: