[SPARK-18634] Corruption and Correctness issues with exploding Python UDFs - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 2.0.2, 2.1.0
Fix Version/s: 2.0.3, 2.1.0
Component/s: PySpark, SQL
Labels:
None

Description

There are some weird issues with exploding Python UDFs in SparkSQL.

There are 2 cases where based on the DataType of the exploded column, the result can be flat out wrong, or corrupt. Seems like something bad is happening when telling Tungsten the schema of the rows during or after applying the UDF.

Please check the code below for reproduction.

Notebook: https://databricks-prod-cloudfront.cloud.databricks.com/public/4027ec902e239c93eaaa8714f173bcfc/6186780348633019/3425836135165635/4343791953238323/latest.html

Attachments

Issue Links

links to

[Github] Pull Request #16120 (viirya)

[Github] Pull Request #16170 (hvanhovell)

Activity

People

Assignee:: L. C. Hsieh

Reporter:: Burak Yavuz

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 29/Nov/16 20:54

Updated:: 06/Dec/16 11:17

Resolved:: 06/Dec/16 01:51