Details
-
Bug
-
Status: Resolved
-
Blocker
-
Resolution: Fixed
-
1.5.0
-
None
Description
HashPartitioning compatibility is defined w.r.t the set of expressions, but in other contexts the ordering of those expressions matters. This is illustrated by the following regression test:
test("HashPartitioning compatibility") { val expressions = Seq(Literal(2), Literal(3)) // Consider two HashPartitionings that have the same _set_ of hash expressions but which are // created with different orderings of those expressions: val partitioningA = HashPartitioning(expressions, 100) val partitioningB = HashPartitioning(expressions.reverse, 100) // These partitionings are not considered equal: assert(partitioningA != partitioningB) // However, they both satisfy the same clustered distribution: val distribution = ClusteredDistribution(expressions) assert(partitioningA.satisfies(distribution)) assert(partitioningB.satisfies(distribution)) // Both partitionings are compatible with and guarantee each other: assert(partitioningA.compatibleWith(partitioningB)) assert(partitioningB.compatibleWith(partitioningA)) assert(partitioningA.guarantees(partitioningB)) assert(partitioningB.guarantees(partitioningA)) // Given all of this, we would expect these partitionings to compute the same hashcode for // any given row: def computeHashCode(partitioning: HashPartitioning): Int = { val hashExprProj = new InterpretedMutableProjection(partitioning.expressions, Seq.empty) hashExprProj.apply(InternalRow.empty).hashCode() } assert(computeHashCode(partitioningA) === computeHashCode(partitioningB)) }