Uploaded image for project: 'Samza'
  1. Samza
  2. SAMZA-839

KafkaSystemProducer should use the same partitioning hash function as Kafka's producer

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 0.9.1
    • None
    • kafka
    • None

    Description

      Samza's KafkaSystemProducer class generates the partition key using:

      abs(envelope.getPartitionKey.hashCode()) % numPartitions

      However, Kafka's producer generates the partition key this way:

      Utils.abs(Utils.murmur2(record.key())) % numPartitions

      This makes it difficult for me to join 2 data sources on a common key when one source is produced by Samza and the other by a default Kafka producer.

      As a work-around, I have to modify the upstream job (that uses the default kafka producer) to write with an explicit partition key using Samza's hashing logic.

      See https://docs.google.com/document/d/1iNSNQLDqCIfRiKz1MU3o6UMT_HxjgkgxpLhVcND4dYY/edit?usp=sharing for design.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              kishorenc Kishore Nallan
              Votes:
              1 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated: