Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-20416

Column names inconsistent for UDFs in SQL vs Dataset

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Minor
    • Resolution: Fixed
    • 2.2.0
    • 2.3.0
    • SQL
    • None

    Description

      As you can see below, the name of the columns in SQL vs Dataset is different.

      scala> val timesTwoUDF = spark.udf.register("timesTwo", (x: Int) => x * 2)
      timesTwoUDF: org.apache.spark.sql.expressions.UserDefinedFunction = UserDefinedFunction(<function1>,IntegerType,Some(List(IntegerType)))
      
      scala> spark.sql("SELECT timesTwo(1)").show
      +---------------+
      |UDF:timesTwo(1)|
      +---------------+
      |              2|
      +---------------+
      
      scala> spark.range(1, 2).toDF("x").select(timesTwoUDF($"x")).show
      +------+
      |UDF(x)|
      +------+
      |     2|
      +------+
      

      Attachments

        Activity

          People

            maropu Takeshi Yamamuro
            jlaskowski Jacek Laskowski
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: