The following query produces incorrect results. The query has two essential features: (1) it contains a string aggregate, resulting in a SortExec node, and (2) it contains a duplicate grouping key, causing RemoveRepetitionFromGroupExpressions to produce a sort order stored as a Stream.
SELECT bigint_col_1, bigint_col_9, MAX(CAST(bigint_col_1 AS string))
GROUP BY bigint_col_1, bigint_col_9, bigint_col_9
When the sort order is stored as a Stream, the line ordering.map(_.child.genCode(ctx)) in GenerateOrdering#createOrderKeys() produces unpredictable side effects to ctx. This is because genCode(ctx) modifies ctx. When ordering is a Stream, the modifications will not happen immediately as intended, but will instead occur lazily when the returned Stream is used later.
Similar bugs have occurred at least three times in the past: https://issues.apache.org/jira/browse/SPARK-24500, https://issues.apache.org/jira/browse/SPARK-25767, https://issues.apache.org/jira/browse/SPARK-26680.
The fix is to check if ordering is a Stream and force the modifications to happen immediately if so.