Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-34714

collect_list(struct()) fails when used with GROUP BY

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 3.1.1
    • Fix Version/s: 3.1.2
    • Component/s: SQL
    • Labels:
      None
    • Environment:

      Databricks Runtime 8.0

      Description

      The following is failing in DBR8.0 / Spark 3.1.1, but works in earlier DBR and Spark versions:

      with step_1 as (
          select 'E' as name, named_struct('subfield', 1) as field_1
      )
      select name, collect_list(struct(field_1.subfield))
      from step_1
      group by 1

      Fails with the following error message:

      AnalysisException: cannot resolve 'struct(step_1.`field_1`.`subfield`)' due to data type mismatch: Only foldable string expressions are allowed to appear at odd position, got: NamePlaceholder

      If you modify the query in any of the following ways then it still works::

      • if you remove the field "name" and the "group by 1" part of the query
      • if you remove the "struct()" from within the collect_list()
      • if you use "named_struct()" instead of "struct()" within the collect_list()

      Similarly collect_set() is broken and possibly more related functions, but I haven't done thorough testing.

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              laurikoobas Lauri Koobas
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: