Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-26331

Allow SQL UDF registration to recognize default function values from Scala

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Minor
    • Resolution: Won't Fix
    • 2.4.0
    • None
    • PySpark, SQL
    • None

    Description

      As described here:

      https://stackoverflow.com/q/53702727/3576984

      I have a UDF I would like to be flexible enough to accept 3 arguments (or in general n+k), but for the most part, only 2 (in general, n) are required. The natural approach to this is to implement the UDF with 3 arguments, one of which has a standard default value.

      Copying a toy example from SO:

      // Scala
      package myUDFs
      import org.apache.spark.sql.api.java.UDF3
      class my_udf extends UDF3[Int, Int, Int, Int] {
        override def call(a: Int, b: Int, c: Int = 6): Int = { 
          c*(a + b)
        }
      }

      I would prefer the following to give the expected output of 18:

      # Python
      from pyspark.conf import SparkConf
      from pyspark.sql import SparkSession
      from pyspark.sql.types import IntType
      
      spark_conf = SparkConf().setAll([('spark.jars', 'myUDFs-assembly-0.1.1.jar')])
      spark = SparkSession.builder.appName('my_app').config(conf = spark_conf).enableHiveSupport().getOrCreate()
      spark.udf.registerJavaFunction("my_udf", "myUDFs.my_udf", IntType())
      
      spark.sql('select my_udf(1, 2)').collect()
      

      But it seems this is currently impossible.

      It seems like the only current work around is to define two UDFs, one with the default pre-specified; the other with flexible parameters.

      Attachments

        Activity

          People

            Unassigned Unassigned
            michaelchirico Michael Chirico
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: