Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-32819

Spark SQL aggregate() fails on nested string arrays

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 3.0.0
    • Fix Version/s: 3.0.2, 3.1.0
    • Component/s: SQL
    • Labels:
      None
    • Environment:

      Spark 3.0.0 on Databricks (DBR 7.2).

      Description

      The aggregate() function seems to fail if the initial state is array(array( some string )). Seems to work if it's an INT for example.

      Example that works with a simple array:

      select aggregate(split('abcdefgh',''), array(''), (acc, x) -> array( x ) )

      Example that works with nested array and INTs:

      select aggregate(sequence(0,9), array(array(0)), (acc, x) -> array(array( x ) ) )

      Example that errors:

      select aggregate(split('abcdefgh',''), array(array('')), (acc, x) -> array(array( x ) ) )

      Producing the following (shortened) error:

      data type mismatch: argument 3 requires array<array<string>> type, however ... is of array<array<string>> type.

       

        Attachments

          Activity

            People

            • Assignee:
              viirya L. C. Hsieh
              Reporter:
              laurikoobas Lauri Koobas
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: