Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-26182

Cost increases when optimizing scalaUDF

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Invalid
    • 2.4.0
    • None
    • Optimizer, SQL
    • None

    Description

      Let's assume that we have a udf called splitUDF which outputs a map data.
      The SQL

      select
          g['a'], g['b']
      from
         ( select splitUDF(x) as g from table) tbl
      

      will be optimized to the same logical plan of

      select splitUDF(x)['a'], splitUDF(x)['b'] from table
      

      which means that the splitUDF is executed twice instead of once.

      The optimization is from CollapseProject.
      I'm not sure whether this is a bug or not. Please tell me if I was wrong about this.

      Attachments

        Activity

          People

            Unassigned Unassigned
            wind_ljy Jiayi Liao
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: