Description
When running the query
with cte as (
select c1, c1, c2, c3 from t where random() > 0
)
select cte.c1, cte2.c1, cte.c2, cte2.c3 from
(select c1, c2 from cte) cte
inner join
(select c1, c3 from cte) cte2
on cte.c1 = cte2.c1
The query fails with the error
org.apache.spark.scheduler.DAGScheduler: Failed to update accumulator 9523 (Unknown class) for task 1
org.apache.spark.SparkException: attempted to access non-existent accumulator 9523
Further investigation shows that the rule PushdownPredicatesAndPruneColumnsForCTEDef creates an invalid plan when the output of a CTE contains duplicate expression IDs.