Details
-
Bug
-
Status: Resolved
-
Minor
-
Resolution: Incomplete
-
2.2.1
-
None
Description
When columns have different metadata and we union dataframes with them - the end result of metadata depends on union ordering:
df = spark.createDataFrame([{'a':1}]) a = df b = df.select(col('a').alias('a',metadata={'description':'xxx'})) print("a.union(b) gives {}".format(a.union(b).schema.fields[0].metadata)) print("b.union(a) gives {}".format(b.union(a).schema.fields[0].metadata))
gives:
a.union(b) gives {} b.union(a) gives {'description': 'xxx'}
And I wonder if this kind of union should be allowed at all - when fields with different metadata are inside a struct - union fails, which can be seen in https://issues.apache.org/jira/projects/SPARK/issues/SPARK-23477