Uploaded image for project: 'Crunch (Retired)'
  1. Crunch (Retired)
  2. CRUNCH-528

Pair: Integer overflow during comparison can cause inconsistent sort.

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Minor
    • Resolution: Fixed
    • None
    • 0.13.0
    • Core
    • None
    • Patch

    Description

      Pair uses the hash code of each value for comparison if the values are not themselves comparable. If the hash code values are too large, then the values will wrap when doing subtraction. This results in a comparison function that is not transitive.

      Among other things, this makes Joins using the in-memory pipeline not work, since the in-memory shuffler uses a TreeMap if the key type is Comparable. Since the key in a join is a Pair of the original key and a join tag, the key is always comparable. With a non-transitive comparison function, it is possible for the two join tags of the original key to sort differently, resulting in the two join tags not being adjacent for the original key. This results either in either the cross product erroneously producing no values in the case of an inner join, since the two join tags are not adjacent, or null values appearing when they should not in the case of an outer join.

      As a workaround, ensure that the key used in a Join is comparable.

      Attachments

        1. 0001-Pair-Fix-comparison-for-large-hash-codes.patch
          3 kB
          Brandon Vargo
        2. CRUNCH-528.2.patch
          5 kB
          Gabriel Reid

        Activity

          People

            gabriel.reid Gabriel Reid
            bvargo Brandon Vargo
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: