Uploaded image for project: 'Kafka'
  1. Kafka
  2. KAFKA-4475

Poor kafka-streams throughput

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Duplicate
    • 0.10.1.0
    • None
    • streams
    • None

    Description

      Hi!

      I'm writing because I have a worry about kafka-streams throughput.

      I have only a kafka-streams application instance that consumes from 'input' topic, prints on the screen and produces in 'output' topic. All topics have 4 partitions. As can be observed the topology is very simple.

      I produce 120K messages/second to 'input' topic, when I measure the 'output' topic I detect that I'm receiving ~4K messages/second. I had next configuration (Remaining parameters by default):

      application.id: myApp
      bootstrap.servers: localhost:9092
      zookeeper.connect: localhost:2181
      num.stream.threads: 1

      I was doing proofs and tests without success, but when I created a new 'input' topic with 1 partition (Maintain 'output' topic with 4 partitions) I got in 'output' topic 120K messages/seconds.

      I have been doing some performance tests and proof with next cases (All topics have 4 partitions in all cases):

      Case A - 1 Instance:

      • With num.stream.threads set to 1 I had ~3785 messages/second
      • With num.stream.threads set to 2 I had ~3938 messages/second
      • With num.stream.threads set to 4 I had ~120K messages/second

      Case B - 2 Instances:

      • With num.stream.threads set to 1 I had ~3930 messages/second for each instance (And throughput ~8K messages/second)
      • With num.stream.threads set to 2 I had ~3945 messages/second for each instance (And more or less same throughput that with num.stream.threads set to 1)

      Case C - 4 Instances

      • With num.stream.threads set to 1 I had 3946 messages/seconds for each instance (And throughput ~17K messages/second):

      As can be observed when num.stream.threads is set to #partitions I have best results. Then I have next questions:

      • Why whether I have a topic with #partitions > 1 and with num.streams.threads is set to 1 I have ~4K messages/second always?
      • In case C. 4 instances with num.stream.threads set to 1 should be better that 1 instance with num.stream.threads set to 4. Is corrects this supposition?

      This is the kafka-streams application that I use: https://gist.github.com/Chorro/5522ec4acd1a005eb8c9663da86f5a18

      Attachments

        Activity

          People

            Unassigned Unassigned
            jjchorrobe Juan J Chorro
            Votes:
            1 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: