[SPARK-49743] OptimizeCsvJsonExpr should not change the schema of underlying StructType in GetArrayStructFields - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 3.5.2
Fix Version/s: 3.5.4
Component/s: SQL
Labels:
- pull-request-available

Description

The `OptimizeCsvJsonExprs` rule can potentially change the schema of the underlying `StructField` if there are differences in the field used to access the struct vs the field in the underlying struct.

This surfaces as a correctness issue where instead of picking the values for the corresponding column we end up returning NULL.

A simple example query is:

SELECT
  from_json('[{"a": '||id||', "b": '|| (2*id) ||'}]', 'array<struct<a: INT, b: INT>>').a,
  from_json('[{"a": '||id||', "b": '|| (2*id) ||'}]', 'array<struct<a: INT, b: INT>>').A
FROM
  range(3) as t

Here, the result is `[0], [1], [2]` for `a` but `[null], [null], [null]` for `A`. Since struct field accessor is case-insensitive, the result should had been `[0], [1], [2]` for both.

Attachments

Issue Links

links to

GitHub Pull Request #48190

Activity

People

Assignee:: Nikhil Sheoran

Reporter:: Nikhil Sheoran

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 20/Sep/24 18:53

Updated:: 01/Oct/24 01:49

Resolved:: 01/Oct/24 01:49