[SPARK-26182] Cost increases when optimizing scalaUDF - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Major
Resolution: Invalid
Affects Version/s: 2.4.0
Fix Version/s: None
Component/s: Optimizer, SQL
Labels:
None

Description

Let's assume that we have a udf called splitUDF which outputs a map data.
The SQL

select
    g['a'], g['b']
from
   ( select splitUDF(x) as g from table) tbl

will be optimized to the same logical plan of

select splitUDF(x)['a'], splitUDF(x)['b'] from table

which means that the splitUDF is executed twice instead of once.

The optimization is from CollapseProject.
I'm not sure whether this is a bug or not. Please tell me if I was wrong about this.

Attachments

Activity

People

Assignee:: Unassigned

Reporter:: Jiayi Liao

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 27/Nov/18 09:50

Updated:: 17/May/20 17:58

Resolved:: 10/May/19 04:15