Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-49743

OptimizeCsvJsonExpr should not change the schema of underlying StructType in GetArrayStructFields

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 3.5.2
    • 3.5.4
    • SQL

    Description

      The `OptimizeCsvJsonExprs` rule can potentially change the schema of the underlying `StructField` if there are differences in the field used to access the struct vs the field in the underlying struct.

      This surfaces as a correctness issue where instead of picking the values for the corresponding column we end up returning NULL.

       

      A simple example query is:

      SELECT
        from_json('[{"a": '||id||', "b": '|| (2*id) ||'}]', 'array<struct<a: INT, b: INT>>').a,
        from_json('[{"a": '||id||', "b": '|| (2*id) ||'}]', 'array<struct<a: INT, b: INT>>').A
      FROM
        range(3) as t

       

       

      Here, the result is `[0], [1], [2]` for `a` but `[null], [null], [null]` for `A`. Since struct field accessor is case-insensitive, the result should had been `[0], [1], [2]` for both.

      Attachments

        Issue Links

          Activity

            People

              nikhilsheoran-db Nikhil Sheoran
              nikhilsheoran-db Nikhil Sheoran
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: