Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-18634

Corruption and Correctness issues with exploding Python UDFs

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.0.2, 2.1.0
    • 2.0.3, 2.1.0
    • PySpark, SQL
    • None

    Description

      There are some weird issues with exploding Python UDFs in SparkSQL.

      There are 2 cases where based on the DataType of the exploded column, the result can be flat out wrong, or corrupt. Seems like something bad is happening when telling Tungsten the schema of the rows during or after applying the UDF.

      Please check the code below for reproduction.

      Notebook: https://databricks-prod-cloudfront.cloud.databricks.com/public/4027ec902e239c93eaaa8714f173bcfc/6186780348633019/3425836135165635/4343791953238323/latest.html

      Attachments

        Activity

          People

            viirya L. C. Hsieh
            brkyvz Burak Yavuz
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: