Description
Unioning two DataFrames that contain UDTs fails with
AnalysisException: u"unresolved operator 'Union;"
I tracked this down to this line https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicOperators.scala#L202
Which compares datatypes between the output attributes of both logical plans. However for UDTs this will be a new instance of the UserDefinedType or PythonUserDefinedType https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/types/DataType.scala#L158
So this equality check will check if the two instances are the same and since they aren't references to a singleton this check fails.
Note: this will work fine if you are unioning the dataframe with itself.
I have a proposed patch for this which overrides the equality operator on the two classes here: https://github.com/apache/spark/pull/11279
Reproduction steps
from pyspark.sql.tests import PythonOnlyPoint, PythonOnlyUDT from pyspark.sql import types schema = types.StructType([types.StructField("point", PythonOnlyUDT(), True)]) #note they need to be two separate dataframes a = sqlCtx.createDataFrame([[PythonOnlyPoint(1.0, 2.0)]], schema) b = sqlCtx.createDataFrame([[PythonOnlyPoint(3.0, 4.0)]], schema) c = a.unionAll(b)