Details
-
Bug
-
Status: Closed
-
Minor
-
Resolution: Fixed
-
1.19.0
-
None
Description
Problem can be reproduced via:
select distinct * from (values (1, ROW(1,1)), (1, ROW(1,1)), (2, ROW(2,2))) as v(id,struct);
Which incorrectly returns a duplicated value:
+----+--------+ | ID | STRUCT | +----+--------+ | 1 | {1, 1} | | 1 | {1, 1} | | 2 | {2, 2} | +----+--------+ (3 rows)
The root cause is that currently ArrayEqualityComparer (which is used as comparer for JavaRowFormat.ARRAY) performs the array comparison based on Arrays#equals and Arrays#hashCode (see Functions.java):
private static class ArrayEqualityComparer implements EqualityComparer<Object[]> { public boolean equal(Object[] v1, Object[] v2) { return Arrays.equals(v1, v2); } public int hashCode(Object[] t) { return Arrays.hashCode(t); } }
This will lead to incorrect comparisons in case of multidimensional arrays, e.g. a row (array) with a struct field (another array) inside. To fix the issue, Arrays#deepEquals / Arrays#deepHashCode should be used:
private static class ArrayEqualityComparer implements EqualityComparer<Object[]> { public boolean equal(Object[] v1, Object[] v2) { return Arrays.deepEquals(v1, v2); } public int hashCode(Object[] t) { return Arrays.deepHashCode(t); } }
Attachments
Issue Links
- relates to
-
CALCITE-3482 Equality of nested ROWs returns false for identical literal value
- Closed
- links to