Uploaded image for project: 'Kafka'
  1. Kafka
  2. KAFKA-2550

[Kafka][0.8.2.1][Performance]When there are a lot of partition under a Topic, there are serious performance degradation.

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Auto Closed
    • 0.8.2.1
    • None
    • clients, consumer, producer
    • None

    Description

      Because of business need to create a large number of partitions,I test the partition number of support.
      But I find When there are a lot of partition under a Topic, there are serious performance degradation.
      Through the analysis, in addition to the hard disk is bottleneck, the client is the bottleneck

      I use JProfile,producer and consumer 1000000 message(msg size:500byte)
      1、Consumer high level API:(I find i can't upload picture?)
      ZookeeperConsumerConnector.scala-->rebalance
      -->val assignmentContext = new AssignmentContext(group, consumerIdString, config.excludeInternalTopics, zkClient)
      -->ZkUtils.getPartitionsForTopics(zkClient, myTopicThreadIds.keySet.toSeq)
      -->getPartitionAssignmentForTopics
      -->Json.parseFull(jsonPartitionMap)
      1) one topic 400 partion:
      JProfile:48.6% cpu run time
      2) ont topic 3000 partion:
      JProfile:97.8% cpu run time

      Maybe the file(jsonPartitionMap) is very big lead to parse is very slow.
      But this function is executed only once, so the problem should not be too big.

      2、Producer Scala API:
      BrokerPartitionInfo.scala--->getBrokerPartitionInfo:
      partitionMetadata.map { m =>
      m.leader match

      { case Some(leader) => //y00163442 delete log print debug("Partition [%s,%d] has leader %d".format(topic, m.partitionId, leader.id)) new PartitionAndLeader(topic, m.partitionId, Some(leader.id)) case None => //y00163442 delete log print //debug("Partition [%s,%d] does not have a leader yet".format(topic, m.partitionId)) new PartitionAndLeader(topic, m.partitionId, None) }

      }.sortWith((s, t) => s.partitionId < t.partitionId)

      When partitions number>25,the function 'format' cpu run time is 44.8%.
      Nearly half of the time consumption in the format function.whether the log print open, this format will be executed.Led to the decrease of the TPS for five times(25000--->5000).

      3、Producer JAVA client(clients module):
      function:org.apache.kafka.clients.producer.KafkaProducer.send
      I find the function 'send' cpu run time rise with the rising number of partitions ,when partions is 5000,the cpu run time is 60.8.
      Because Kafka broker side of CPU, memory, disk, the network didn't reach the bottleneck , No matter request.required.acks is set to 0 or 1, the results are similar, I doubt the send there may be some bottlenecks.

      Very unfortunately to upload pictures don't succeed, can't see the results.
      My test results, for a single server, a single hard disk can support 1000 partitions, 7 hard disk can support 3000 partitions.If can solve the bottleneck for the client, then seven hard disk I estimate that can support more partitions.

      Actual production configuration, could be more partitions configuration under more than one TOPIC,Things could be better.

      Attachments

        Activity

          People

            nehanarkhede Neha Narkhede
            sledge.yanwei yanwei
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: