Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-18634

Corruption and Correctness issues with exploding Python UDFs

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 2.0.2, 2.1.0
    • Fix Version/s: 2.0.3, 2.1.0
    • Component/s: PySpark, SQL
    • Labels:
      None

      Description

      There are some weird issues with exploding Python UDFs in SparkSQL.

      There are 2 cases where based on the DataType of the exploded column, the result can be flat out wrong, or corrupt. Seems like something bad is happening when telling Tungsten the schema of the rows during or after applying the UDF.

      Please check the code below for reproduction.

      Notebook: https://databricks-prod-cloudfront.cloud.databricks.com/public/4027ec902e239c93eaaa8714f173bcfc/6186780348633019/3425836135165635/4343791953238323/latest.html

        Attachments

          Activity

            People

            • Assignee:
              viirya L. C. Hsieh
              Reporter:
              brkyvz Burak Yavuz
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: