Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-40292

arrays_zip output unexpected alias column names

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Task
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 3.4.0
    • 3.3.1, 3.2.3, 3.4.0
    • SQL
    • None

    Description

      For the below query:

      with q as (
        select
          named_struct(
            'my_array', array(named_struct('x', 1, 'y', 2))
          ) as my_struct
      )
      select
        arrays_zip(my_struct.my_array)
      from
        q 

      The latest spark gives the below schema, the field name "my_array" was changed to "0"

      root
       |-- arrays_zip(my_struct.my_array): array (nullable = true)
       |    |-- element: struct (containsNull = false)
       |    |    |-- 0: struct (nullable = true)
       |    |    |    |-- x: integer (nullable = true)
       |    |    |    |-- y: integer (nullable = true)

      While Spark 3.1 gives the expected result

      root
       |-- arrays_zip(my_struct.my_array): array (nullable = true)
       |    |-- element: struct (containsNull = false)
       |    |    |-- my_array: struct (nullable = true)
       |    |    |    |-- x: integer (nullable = true)
       |    |    |    |-- y: integer (nullable = true)
      

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            ivan.sadikov Ivan Sadikov
            linhongliu-db Linhong Liu
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment