Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-5874

Reject arrays as keys in DataStream API to avoid inconsistent hashing

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Blocker
    • Resolution: Fixed
    • 1.1.4, 1.2.0
    • None
    • API / DataStream
    • None

    Description

      This issue has been reported on the mailing list twice:

      The problem is the following: We are using just Key[].hashCode() to compute the hash when shuffling data. Java's default hashCode() implementation doesn't take the arrays contents into account, but the memory address.
      This leads to different hash code on the sender and receiver side.
      In Flink 1.1 this means that the data is shuffled randomly and not keyed, and in Flink 1.2 the keygroups code detect a violation of the hashing.

      The proper fix of the problem would be to rely on Flink's TypeComparator class, which has a type-specific hashing function. But introducing this change would break compatibility with existing code.
      I'll file a JIRA for the 2.0 changes for that fix.

      For 1.2.1 and 1.3.0 we should at least reject arrays as keys.

      Attachments

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            kkl0u Kostas Kloudas
            rmetzger Robert Metzger
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment