[SPARK-32819] Spark SQL aggregate() fails on nested string arrays - ASF JIRA

XML

Word

Printable

JSON

The aggregate() function seems to fail if the initial state is array(array( some string )). Seems to work if it's an INT for example.

Example that works with a simple array:

select aggregate(split('abcdefgh',''), array(''), (acc, x) -> array( x ) )

Example that works with nested array and INTs:

select aggregate(sequence(0,9), array(array(0)), (acc, x) -> array(array( x ) ) )

Example that errors:

select aggregate(split('abcdefgh',''), array(array('')), (acc, x) -> array(array( x ) ) )

Producing the following (shortened) error:

data type mismatch: argument 3 requires array<array<string>> type, however ... is of array<array<string>> type.

links to

[Github] Pull Request #29698 (viirya)