[SPARK-19425] Make ExtractEquiJoinKeys support UDT columns - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 2.1.0
Fix Version/s: 2.2.0
Component/s: SQL
Labels:
None

Description

DataFrame.except doesn't work for UDT columns. It is because ExtractEquiJoinKeys will run Literal.default against UDT. However, we don't handle UDT in Literal.default and an exception will throw like:

java.lang.RuntimeException: no default for type
org.apache.spark.ml.linalg.VectorUDT@3bfc3ba7
at org.apache.spark.sql.catalyst.expressions.Literal$.default(literals.scala:179)
at org.apache.spark.sql.catalyst.planning.ExtractEquiJoinKeys$$anonfun$4.apply(patterns.scala:117)
at org.apache.spark.sql.catalyst.planning.ExtractEquiJoinKeys$$anonfun$4.apply(patterns.scala:110)

More simple fix is just let Literal.default handle UDT by its sql type. So we can use more efficient join type on UDT.

Besides except, this also fixes other similar scenarios, so in summary this fixes:

except on two Datasets with UDT
intersect on two Datasets with UDT
Join with the join conditions using <=> on UDT columns

Attachments

Issue Links

links to

[Github] Pull Request #16765 (viirya)

Activity

People

Assignee:: L. C. Hsieh

Reporter:: L. C. Hsieh

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 01/Feb/17 15:14

Updated:: 04/Feb/17 23:59

Resolved:: 04/Feb/17 23:59