Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-36554

Error message while trying to use spark sql functions directly on dataframe columns without using select expression

    XMLWordPrintableJSON

Details

    Description

      The below code generates a dataframe successfully . Here make_date function is used inside a select expression

       

      from pyspark.sql.functions import  expr, make_date

      df = spark.createDataFrame([(2020, 6, 26), (1000, 2, 29), (-44, 1, 1)],['Y', 'M', 'D'])

      df.select("*",expr("make_date(Y,M,D) as lk")).show()

       

      The below code fails with a message "cannot import name 'make_date' from 'pyspark.sql.functions'" . Here the make_date function is directly called on dataframe columns without select expression

       

      from pyspark.sql.functions import make_date

      df = spark.createDataFrame([(2020, 6, 26), (1000, 2, 29), (-44, 1, 1)],['Y', 'M', 'D'])

      df.select(make_date(df.Y,df.M,df.D).alias("datefield")).show()

       
      The error message generated is misleading when it says "cannot import make_date from pyspark.sql.functions"

       

      Attachments

        1. Screen Shot .png
          202 kB
          Lekshmi Ramachandran

        Activity

          People

            nicolasazrak Nicolas Azrak
            lekshmiii Lekshmi Ramachandran
            Lekshmi Ramachandran Lekshmi Ramachandran
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

                Estimated:
                Original Estimate - 24h
                24h
                Remaining:
                Remaining Estimate - 24h
                24h
                Logged:
                Time Spent - Not Specified
                Not Specified