Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-35382

Fix lambda variable name issues in nested DataFrame functions in Python APIs

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Critical
    • Resolution: Fixed
    • 3.1.1
    • 3.1.2, 3.2.0
    • PySpark

    Description

      Python side also has the same issue as SPARK-34794

      from pyspark.sql.functions import *
      df = sql("SELECT array(1, 2, 3) as numbers, array('a', 'b', 'c') as letters")
      df.select(
          transform(
              "numbers",
              lambda n: transform("letters", lambda l: struct(n.alias("n"), l.alias("l")))
          )
      ).show()
      
      +------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
      |transform(numbers, lambdafunction(transform(letters, lambdafunction(struct(namedlambdavariable() AS n, namedlambdavariable() AS l), namedlambdavariable())), namedlambdavariable()))|
      +------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
      |                                                                                                                                                                [[{a, a}, {b, b},...|
      +------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
      

      Attachments

        Issue Links

          Activity

            People

              ueshin Takuya Ueshin
              hyukjin.kwon Hyukjin Kwon
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: