Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
0.9.1
-
None
-
None
Description
Samza's KafkaSystemProducer class generates the partition key using:
abs(envelope.getPartitionKey.hashCode()) % numPartitions
However, Kafka's producer generates the partition key this way:
Utils.abs(Utils.murmur2(record.key())) % numPartitions
This makes it difficult for me to join 2 data sources on a common key when one source is produced by Samza and the other by a default Kafka producer.
As a work-around, I have to modify the upstream job (that uses the default kafka producer) to write with an explicit partition key using Samza's hashing logic.
See https://docs.google.com/document/d/1iNSNQLDqCIfRiKz1MU3o6UMT_HxjgkgxpLhVcND4dYY/edit?usp=sharing for design.
Attachments
Issue Links
- contains
-
SAMZA-762 java arrays not usable as keys anymore
-
- Patch Available
-
- duplicates
-
SAMZA-1866 Invalid partition calculation in KafkaSystemProducer
-
- Resolved
-