Kafka
  1. Kafka
  2. KAFKA-1516

Producer Performance Test sends messages with bytes of 0x0

    Details

    • Type: Bug Bug
    • Status: Open
    • Priority: Minor Minor
    • Resolution: Unresolved
    • Affects Version/s: 0.8.1.1
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      The producer performance test in Kafka sends messages with either 0x0 bytes or messages with all X's. This skews the compression ratio massively and probably affects performance in other ways.

      We want to create messages which will give a more realistic performance profile. Using random bytes may not be the best solution as these won't compress at all and will skew compression times.

      Perhaps using a template which injects random or sequential data into it could work. Or maybe I'm overthinking it and we should just go for random bytes. What other options do we have? Others seem to use random bytes like cassandra-stress.

        Activity

        Daniel Compton created issue -
        Daniel Compton made changes -
        Field Original Value New Value
        Description The producer performance test in Kafka sends messages with either [0x0 bytes|https://github.com/apache/kafka/blob/0.8.1/perf/src/main/scala/kafka/perf/ProducerPerformance.scala#L237] or messages with [all X's|https://github.com/apache/kafka/blob/0.8.1/perf/src/main/scala/kafka/perf/ProducerPerformance.scala#L225]. This skews the compression ratio massively and probably affects performance in other ways.

        We want to create messages which will give a more realistic performance profile. Using random bytes may not be the best solution as these won't compress at all and will skew compression times.

        Perhaps using a template which injects random or sequential data into it could work. Or maybe I'm overthinking it and we should just go for random bytes.
        The producer performance test in Kafka sends messages with either [0x0 bytes|https://github.com/apache/kafka/blob/0.8.1/perf/src/main/scala/kafka/perf/ProducerPerformance.scala#L237] or messages with [all X's|https://github.com/apache/kafka/blob/0.8.1/perf/src/main/scala/kafka/perf/ProducerPerformance.scala#L225]. This skews the compression ratio massively and probably affects performance in other ways.

        We want to create messages which will give a more realistic performance profile. Using random bytes may not be the best solution as these won't compress at all and will skew compression times.

        Perhaps using a template which injects random or sequential data into it could work. Or maybe I'm overthinking it and we should just go for random bytes. What other options do we have? Others seem to use random bytes like [cassandra-stress|https://github.com/zznate/cassandra-stress/blob/master/src/main/java/com/riptano/cassandra/stress/InsertCommand.java#L39]
        Daniel Compton made changes -
        Description The producer performance test in Kafka sends messages with either [0x0 bytes|https://github.com/apache/kafka/blob/0.8.1/perf/src/main/scala/kafka/perf/ProducerPerformance.scala#L237] or messages with [all X's|https://github.com/apache/kafka/blob/0.8.1/perf/src/main/scala/kafka/perf/ProducerPerformance.scala#L225]. This skews the compression ratio massively and probably affects performance in other ways.

        We want to create messages which will give a more realistic performance profile. Using random bytes may not be the best solution as these won't compress at all and will skew compression times.

        Perhaps using a template which injects random or sequential data into it could work. Or maybe I'm overthinking it and we should just go for random bytes. What other options do we have? Others seem to use random bytes like [cassandra-stress|https://github.com/zznate/cassandra-stress/blob/master/src/main/java/com/riptano/cassandra/stress/InsertCommand.java#L39]
        The producer performance test in Kafka sends messages with either [0x0 bytes|https://github.com/apache/kafka/blob/0.8.1/perf/src/main/scala/kafka/perf/ProducerPerformance.scala#L237] or messages with [all X's|https://github.com/apache/kafka/blob/0.8.1/perf/src/main/scala/kafka/perf/ProducerPerformance.scala#L225]. This skews the compression ratio massively and probably affects performance in other ways.

        We want to create messages which will give a more realistic performance profile. Using random bytes may not be the best solution as these won't compress at all and will skew compression times.

        Perhaps using a template which injects random or sequential data into it could work. Or maybe I'm overthinking it and we should just go for random bytes. What other options do we have? Others seem to use random bytes like [cassandra-stress|https://github.com/zznate/cassandra-stress/blob/master/src/main/java/com/riptano/cassandra/stress/InsertCommand.java#L39].

          People

          • Assignee:
            Unassigned
            Reporter:
            Daniel Compton
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:

              Development