Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-47525

Support subquery correlation joining on map attributes

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 3.5.0
    • 4.0.0
    • SQL

    Description

      Currently, when a subquery is correlated on a condition like `outer_map[1] = inner_map[1]`, DecorrelateInnerQuery generates a join on the map itself,
      which is unsupported, so the query cannot run - for example:
       

      scala> Seq(Map(0 -> 0)).toDF.createOrReplaceTempView("v")scala> sql("select v1.value[0] from v v1 where v1.value[0] > (select avg(v2.value[0]) from v v2 where v1.value[1] = v2.value[1])").explain
      org.apache.spark.sql.AnalysisException: [UNSUPPORTED_SUBQUERY_EXPRESSION_CATEGORY.UNSUPPORTED_CORRELATED_REFERENCE_DATA_TYPE] Unsupported subquery expression: Correlated column reference 'v1.value' cannot be map type. SQLSTATE: 0A000; line 1 pos 49
      at org.apache.spark.sql.errors.QueryCompilationErrors$.unsupportedCorrelatedReferenceDataTypeError(QueryCompilationErrors.scala:2463)
      ... 

      However, if we rewrite the query to pull out the map access `outer_map[1]` into the outer plan, it succeeds:
       

      scala> sql("""with tmp as (
      select value[0] as value0, value[1] as value1 from v
      )
      select v1.value0 from tmp v1 where v1.value0 > (select avg(v2.value0) from tmp v2 where v1.value1 = v2.value1)""").explain

      Another point that can be improved is that, even if the data type supports join, we still don’t need to join on the full attribute, and we can get a better plan by doing the same rewrite to pull out the extract expression.

      Attachments

        Issue Links

          Activity

            People

              jchen5 Jack Chen
              jchen5 Jack Chen
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: