Uploaded image for project: 'Calcite'
  1. Calcite
  2. CALCITE-3021

Equality of nested ROWs returns false for identical values

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Minor
    • Resolution: Fixed
    • 1.19.0
    • 1.20.0
    • None

    Description

      Problem can be reproduced via:

      select distinct * from (values
          (1, ROW(1,1)),
          (1, ROW(1,1)),
          (2, ROW(2,2))) as v(id,struct);
      

      Which incorrectly returns a duplicated value:

      +----+--------+
      | ID | STRUCT |
      +----+--------+
      |  1 | {1, 1} |
      |  1 | {1, 1} |
      |  2 | {2, 2} |
      +----+--------+
      (3 rows)
      

      The root cause is that currently ArrayEqualityComparer (which is used as comparer for JavaRowFormat.ARRAY) performs the array comparison based on Arrays#equals and Arrays#hashCode (see Functions.java):

        private static class ArrayEqualityComparer implements EqualityComparer<Object[]> {
          public boolean equal(Object[] v1, Object[] v2) {
            return Arrays.equals(v1, v2);
          }
          public int hashCode(Object[] t) {
            return Arrays.hashCode(t);
          }
        }
      

      This will lead to incorrect comparisons in case of multidimensional arrays, e.g. a row (array) with a struct field (another array) inside. To fix the issue, Arrays#deepEquals / Arrays#deepHashCode should be used:

        private static class ArrayEqualityComparer implements EqualityComparer<Object[]> {
          public boolean equal(Object[] v1, Object[] v2) {
            return Arrays.deepEquals(v1, v2);
          }
          public int hashCode(Object[] t) {
            return Arrays.deepHashCode(t);
          }
        }
      

      Attachments

        Issue Links

          Activity

            People

              rubenql Ruben Q L
              rubenql Ruben Q L
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 50m
                  50m