Details
-
IT Help
-
Status: Resolved
-
Major
-
Resolution: Not A Problem
-
2.3.4
-
None
-
None
Description
Hi,
I am migrating from spark 1.6 to spark 2.3. However in collect_list I am getting different schema.
val df_date_agg = df .groupBy($"a",$"b",$"c") .agg(sum($"d").alias("data1"),sum($"e").alias("data2")) .groupBy($"a") .agg(collect_list(array($"b",$"c",$"data1")).alias("final_data1"), collect_list(array($"b",$"c",$"data2")).alias("final_data2"))
When I am running above line in spark 1.6 getting below schema
|-- final_data1: array (nullable = true) | |-- element: string (containsNull = true) |-- final_data2: array (nullable = true) | |-- element: string (containsNull = true)
but in spark 2.3 schema changed to
|-- final_data1: array (nullable = true) | |-- element: array (containsNull = true) | | |-- element: string (containsNull = true) |-- final_data1: array (nullable = true) | |-- element: array (containsNull = true) | | |-- element: string (containsNull = true)
In Spark 1.6 array($"b",$"c",$"data1") is converting to string like this
'[2020-09-26, Ayush, 103.67]'
In spark 2.3 it is converted to WrappedArray
WrappedArray(2020-09-26, Ayush, 103.67)
I want to keep my schema as it is Otherwise all the dependent codes have to change.
Thanks