Details
-
Bug
-
Status: In Progress
-
Minor
-
Resolution: Unresolved
-
3.2.1, 3.3.1, 3.4.0
-
None
-
None
Description
The "prune unrequired references" branch has the condition:
case p @ Project(_, g: Generate) if p.references != g.outputSet =>
This is wrong as generators like Inline will always enter this branch as long as it does not use all the generator output.
Example:
input: <col1: array<struct<a: struct<a: int, b: int>, b: int>>>
Project(a.a as x)
- Generate(Inline(col1), ..., a, b)
p.references is [a]
g.outputSet is [a, b]
This bug makes us never enter the GeneratorNestedColumnAliasing branch below thus miss some optimization opportunities. The condition should be
g.requiredChildOutput.contains(!p.references.contains(_))
Attachments
Issue Links
- causes
-
SPARK-39612 The dataframe returned by exceptAll() can no longer perform operations such as count() or isEmpty(), or an exception will be thrown.
- Resolved
- links to