Uploaded image for project: 'Samza'
  1. Samza
  2. SAMZA-2502

Byte array keys be partitioned based on array contents in InMemorySystemProducer

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 1.5
    • None
    • None

    Description

      InMemorySystemProducer uses the hashCode of the partition key to decide to which partition the message goes. This works well when the key is an object whose hashCode method can be override. But in the case when the partition key is serialized as a byte[], the message can go to any partition. It turns out that the hash code of a byte array is based on the address in memory but not the content. Therefore, even though two messages may have same key, they can be sent to different partitions after their keys are serialized into byte[] whose hash code is kind of random.

       

      We want to be able to partition messages based on the contents of the partition keys. An easy fix would be: in the case of byte array, we calculate the hash code with Arrays.hashCode(byte[] input). This allows us to calculate the hash code of the byte array by its contents.

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            YixingZhang Yixing Zhang
            YixingZhang Yixing Zhang
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0h
                0h
                Logged:
                Time Spent - 40m
                40m

                Slack

                  Issue deployment