Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-33068

Spark 2.3 vs Spark 1.6 collect_list giving different schema

    XMLWordPrintableJSON

Details

    • IT Help
    • Status: Resolved
    • Major
    • Resolution: Not A Problem
    • 2.3.4
    • None
    • Spark Submit
    • None

    Description

      Hi,

      I am migrating from spark 1.6 to spark 2.3. However in collect_list I am getting different schema.

       

      val df_date_agg = df
          .groupBy($"a",$"b",$"c")
          .agg(sum($"d").alias("data1"),sum($"e").alias("data2"))
          .groupBy($"a")
          .agg(collect_list(array($"b",$"c",$"data1")).alias("final_data1"),
               collect_list(array($"b",$"c",$"data2")).alias("final_data2"))
      

      When I am running above line in spark 1.6 getting below schema

       

       

       |-- final_data1: array (nullable = true)
       |    |-- element: string (containsNull = true)
       |-- final_data2: array (nullable = true)
       |    |-- element: string (containsNull = true)
      

       

       

      but in spark 2.3 schema changed to 

       

      |-- final_data1: array (nullable = true)
       |    |-- element: array (containsNull = true)
       |    |    |-- element: string (containsNull = true)
       |-- final_data1: array (nullable = true)
       |    |-- element: array (containsNull = true)
       |    |    |-- element: string (containsNull = true)
      

       

       

      In Spark 1.6 array($"b",$"c",$"data1") is converting to string like this 

      '[2020-09-26, Ayush, 103.67]'
      

      In spark 2.3 it is converted to WrappedArray

      WrappedArray(2020-09-26, Ayush, 103.67)
      

      I want to keep my schema as it is Otherwise all the dependent codes have to change.

       

      Thanks

       

      Attachments

        Activity

          People

            Unassigned Unassigned
            ayush_goyal Ayush Goyal
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: