Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-32819

Spark SQL aggregate() fails on nested string arrays

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 3.0.0
    • 3.0.2, 3.1.0
    • SQL
    • None
    • Spark 3.0.0 on Databricks (DBR 7.2).

    Description

      The aggregate() function seems to fail if the initial state is array(array( some string )). Seems to work if it's an INT for example.

      Example that works with a simple array:

      select aggregate(split('abcdefgh',''), array(''), (acc, x) -> array( x ) )

      Example that works with nested array and INTs:

      select aggregate(sequence(0,9), array(array(0)), (acc, x) -> array(array( x ) ) )

      Example that errors:

      select aggregate(split('abcdefgh',''), array(array('')), (acc, x) -> array(array( x ) ) )

      Producing the following (shortened) error:

      data type mismatch: argument 3 requires array<array<string>> type, however ... is of array<array<string>> type.

       

      Attachments

        Activity

          People

            viirya L. C. Hsieh
            laurikoobas Lauri Koobas
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: