Details

    • Type: Sub-task Sub-task
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.8.0
    • Component/s: None
    • Labels:
      None

      Description

      The existing perf tools - ProducerPerformance.scala, ConsumerPerformance.scala and SimpleConsumerPerformance.scala are slightly buggy. It will be good to -

      1. move them to a perf directory from the existing kafka/tools location
      2. fix the bugs, so that they measure throughput correctly

      1. KAFKA-176-v2.patch
        55 kB
        Neha Narkhede
      2. kafka-176.patch
        55 kB
        Neha Narkhede

        Issue Links

          Activity

          Hide
          Neha Narkhede added a comment -

          Thanks. Just committed this. I'm thinking there will be more improvements to these perf tools as part of KAFKA-175

          Show
          Neha Narkhede added a comment - Thanks. Just committed this. I'm thinking there will be more improvements to these perf tools as part of KAFKA-175
          Hide
          Jay Kreps added a comment -

          +1

          Show
          Jay Kreps added a comment - +1
          Hide
          Neha Narkhede added a comment -

          1. Header moved to perf tools
          2. Default is aggregated stats. Added a show-detailed-stats option for finer grained stats per reporting-interval
          3. Fixed the variable size message bugs in producer performance tool
          4. Abstracted out common perf configs in a top-level PerfConfig class.
          5. Moved tryCleanupZookeeper to Utils

          Show
          Neha Narkhede added a comment - 1. Header moved to perf tools 2. Default is aggregated stats. Added a show-detailed-stats option for finer grained stats per reporting-interval 3. Fixed the variable size message bugs in producer performance tool 4. Abstracted out common perf configs in a top-level PerfConfig class. 5. Moved tryCleanupZookeeper to Utils
          Hide
          Neha Narkhede added a comment -

          1. tryCleanupZookeeper can definitely be a Util.
          2. This is also a good suggestion. There is definitely a lot of overlap amongst PerfConfig in all the perf tests.
          3. For scripts, please can you also take a look at KAFKA-175 ?

          Thanks. I will incorporate these suggestions and submit an updated patch tomorrow.

          Show
          Neha Narkhede added a comment - 1. tryCleanupZookeeper can definitely be a Util. 2. This is also a good suggestion. There is definitely a lot of overlap amongst PerfConfig in all the perf tests. 3. For scripts, please can you also take a look at KAFKA-175 ? Thanks. I will incorporate these suggestions and submit an updated patch tomorrow.
          Hide
          Jay Kreps added a comment -

          tryCleanupZookeeper looks cut-and-pasted from place to place. It shows up in ConsoleConsumer.scala, ReplayLogProducer.scala, and ConsumerPerformance.scala. We should not do that. Can we make some kind of utility function for that?

          Also there is a PerfConfig class, which is a great idea as a way to normalize some of the config options we are using between all the tools. But it looks like the class is just duplicated between the tools. Can this be shared?

          I would like to cleanup the scripts used to run these things so that we get rid of all the silly ancient ones (*simple-perf-test.sh and *shell.sh), but I think I will open a seperate ticket for that since it is unrelated to your changes.

          Show
          Jay Kreps added a comment - tryCleanupZookeeper looks cut-and-pasted from place to place. It shows up in ConsoleConsumer.scala, ReplayLogProducer.scala, and ConsumerPerformance.scala. We should not do that. Can we make some kind of utility function for that? Also there is a PerfConfig class, which is a great idea as a way to normalize some of the config options we are using between all the tools. But it looks like the class is just duplicated between the tools. Can this be shared? I would like to cleanup the scripts used to run these things so that we get rid of all the silly ancient ones (*simple-perf-test.sh and *shell.sh), but I think I will open a seperate ticket for that since it is unrelated to your changes.
          Hide
          Neha Narkhede added a comment -

          1. I think we can add a command line option --header that will control the display of the header

          2. Good suggestion. I think what you are saying is that aggregate stats should be the default, and instead of having --aggregate option, we should have --showDetailedStats option. I think that is a good idea.

          3. OK. I'll take a stab at that, and upload a new patch.

          4. KAFKA-175 will need to be updated as well. It would be great if people can look at that too.

          Show
          Neha Narkhede added a comment - 1. I think we can add a command line option --header that will control the display of the header 2. Good suggestion. I think what you are saying is that aggregate stats should be the default, and instead of having --aggregate option, we should have --showDetailedStats option. I think that is a good idea. 3. OK. I'll take a stab at that, and upload a new patch. 4. KAFKA-175 will need to be updated as well. It would be great if people can look at that too.
          Hide
          Jun Rao added a comment -

          1. It can be in the code. Maybe we just need a command line option to control whether to display header or not?

          2. I am suggesting that we alway show aggregated output. So there won't be a -aggregate option. Instead, have a --showDetails option to enable/disable detailed stats in each thread, probably with default set to false.

          3. Some system tests can use a producer that generates variable-sized random messages. Instead of having another producer tool, it would be good if we can just allow such option in this tool.

          Show
          Jun Rao added a comment - 1. It can be in the code. Maybe we just need a command line option to control whether to display header or not? 2. I am suggesting that we alway show aggregated output. So there won't be a -aggregate option. Instead, have a --showDetails option to enable/disable detailed stats in each thread, probably with default set to false. 3. Some system tests can use a producer that generates variable-sized random messages. Instead of having another producer tool, it would be good if we can just allow such option in this tool.
          Hide
          Neha Narkhede added a comment -

          This is a draft patch, so things that I wasn't sure off, are left incomplete

          1. The header being in debug logging is clearly not ideal. There are 2 ways of exposing the header -
          a. In the code
          b. In the scripts

          Since some graphing tools might need the headers in a certain format, I initially thought its better left in the scripts (see KAFKA-175). However, since the code controls the display of the data, it might as well be left there.

          2. The --avg option already controls that. We could probably rename that to -aggregate instead ?

          3.1 & 3.2 The reason I haven't touched the variable size message path is because I don't think it is well implemented. When selecting that option, the throughput actually reduces due to the way it is implemented. Not sure having a variable message size option actually helps conclude anything in this perf test ?

          4. Good point.

          I will update the patch, once we have a better idea about the questions above.

          Show
          Neha Narkhede added a comment - This is a draft patch, so things that I wasn't sure off, are left incomplete 1. The header being in debug logging is clearly not ideal. There are 2 ways of exposing the header - a. In the code b. In the scripts Since some graphing tools might need the headers in a certain format, I initially thought its better left in the scripts (see KAFKA-175 ). However, since the code controls the display of the data, it might as well be left there. 2. The --avg option already controls that. We could probably rename that to -aggregate instead ? 3.1 & 3.2 The reason I haven't touched the variable size message path is because I don't think it is well implemented. When selecting that option, the throughput actually reduces due to the way it is implemented. Not sure having a variable message size option actually helps conclude anything in this perf test ? 4. Good point. I will update the patch, once we have a better idea about the questions above.
          Hide
          Jun Rao added a comment -

          ProducerPerformance:
          1. The following code seems to be used to enable/disable the header. Is it better to control that in config instead using debug logging (so that it's not mixed with other debug logging)? Also, the header info is not complete, missing the first few fields. The header is probably useful for stats printed out periodically in each thread. So it should be printed out early, if enabled.
          if(logger.isDebugEnabled)
          logger.debug("message size, batch size, total data sent in MB, MB/sec, total data sent in nMsg, nMsg/sec")
          2. Whether avgPerf is specified or not, the user is probably always interested in the aggregated numbers across all threads. How about we always print it out and have a config option "showDetails" to enable/disable periodic reporting in each thread. Ditto in other perf tools.
          3. ProducerThread has multiple bugs:
          3.1. Variable-sized messages are not picked up in Async mode
          3.2. In Sync mode, messageSet needs to be reset for each batch, if messages are of variable size (seems to be an existing bug)
          4. It's better not to duplicate the following code. Defining it once in a static method seems better.
          println(("%s, %d, %d, %d, %d, %.2f, %.4f, %d, %.4f").format(formattedReportTime, config.compressionCodec.codec,
          threadId, config.messageSize, config.batchSize, (bytesSent*1.0)/(1024 * 1024), mbPerSec, nSends, numMessagesPerSec))

          Show
          Jun Rao added a comment - ProducerPerformance: 1. The following code seems to be used to enable/disable the header. Is it better to control that in config instead using debug logging (so that it's not mixed with other debug logging)? Also, the header info is not complete, missing the first few fields. The header is probably useful for stats printed out periodically in each thread. So it should be printed out early, if enabled. if(logger.isDebugEnabled) logger.debug("message size, batch size, total data sent in MB, MB/sec, total data sent in nMsg, nMsg/sec") 2. Whether avgPerf is specified or not, the user is probably always interested in the aggregated numbers across all threads. How about we always print it out and have a config option "showDetails" to enable/disable periodic reporting in each thread. Ditto in other perf tools. 3. ProducerThread has multiple bugs: 3.1. Variable-sized messages are not picked up in Async mode 3.2. In Sync mode, messageSet needs to be reset for each batch, if messages are of variable size (seems to be an existing bug) 4. It's better not to duplicate the following code. Defining it once in a static method seems better. println(("%s, %d, %d, %d, %d, %.2f, %.4f, %d, %.4f").format(formattedReportTime, config.compressionCodec.codec, threadId, config.messageSize, config.batchSize, (bytesSent*1.0)/(1024 * 1024), mbPerSec, nSends, numMessagesPerSec))
          Hide
          Neha Narkhede added a comment -

          This patch fixes a bunch of problems in the perf tools. I'll try to explain the changes at a high level, since I don't remember the exact list of little bugs that existed -

          1. Refactoring ProducerPerformance to include just one code path for both sync and async producer
          2. Fix bugs in ProducerPerformance to fix the throughput calculation and the logic for having multiple threads
          3. Fix various bugs in the consumer performance tools
          4. Fix the output of the tools to include useful info in the csv format
          5. Move all of the above to a perf sub project

          Show
          Neha Narkhede added a comment - This patch fixes a bunch of problems in the perf tools. I'll try to explain the changes at a high level, since I don't remember the exact list of little bugs that existed - 1. Refactoring ProducerPerformance to include just one code path for both sync and async producer 2. Fix bugs in ProducerPerformance to fix the throughput calculation and the logic for having multiple threads 3. Fix various bugs in the consumer performance tools 4. Fix the output of the tools to include useful info in the csv format 5. Move all of the above to a perf sub project

            People

            • Assignee:
              Neha Narkhede
              Reporter:
              Neha Narkhede
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development