Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-20416

Column names inconsistent for UDFs in SQL vs Dataset

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: 2.2.0
    • Fix Version/s: 2.3.0
    • Component/s: SQL
    • Labels:
      None

      Description

      As you can see below, the name of the columns in SQL vs Dataset is different.

      scala> val timesTwoUDF = spark.udf.register("timesTwo", (x: Int) => x * 2)
      timesTwoUDF: org.apache.spark.sql.expressions.UserDefinedFunction = UserDefinedFunction(<function1>,IntegerType,Some(List(IntegerType)))
      
      scala> spark.sql("SELECT timesTwo(1)").show
      +---------------+
      |UDF:timesTwo(1)|
      +---------------+
      |              2|
      +---------------+
      
      scala> spark.range(1, 2).toDF("x").select(timesTwoUDF($"x")).show
      +------+
      |UDF(x)|
      +------+
      |     2|
      +------+
      

        Attachments

          Activity

            People

            • Assignee:
              maropu Takeshi Yamamuro
              Reporter:
              jlaskowski Jacek Laskowski
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: