Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-38530

GeneratorNestedColumnAliasing does not work correctly for some expressions

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 3.2.1
    • 3.3.0
    • Optimizer
    • None

    Description

      https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/NestedColumnAliasing.scala#L226

      The code to collect ExtractValue expressions is wrong. We should do it in a bottom up way instead of only check 2 levels. It can cause incorrect result if the expression looks like ExtractValue(ExtractValue(some_other_expr)).

       

      An example to trigger the bug is:

       

      input: <col1: array<struct<a: int, b: struct<a: struct<a: int, b: int>, b: int>>>>

       

      Project(ExtractValue(ExtractValue(CaseWhen([col.a == 1, col.b]), "a"), "a")

      • Generate(Explode(col1))

       

      We will try to incorrectly push down the whole expression into the input of the Explode, now the input of CaseWhen has array<...> as input so we will get wrong result.

      Attachments

        Activity

          People

            miny Min Yang
            miny Min Yang
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: