Description
As described here:
https://stackoverflow.com/q/53702727/3576984
I have a UDF I would like to be flexible enough to accept 3 arguments (or in general n+k), but for the most part, only 2 (in general, n) are required. The natural approach to this is to implement the UDF with 3 arguments, one of which has a standard default value.
Copying a toy example from SO:
// Scala package myUDFs import org.apache.spark.sql.api.java.UDF3 class my_udf extends UDF3[Int, Int, Int, Int] { override def call(a: Int, b: Int, c: Int = 6): Int = { c*(a + b) } }
I would prefer the following to give the expected output of 18:
# Python from pyspark.conf import SparkConf from pyspark.sql import SparkSession from pyspark.sql.types import IntType spark_conf = SparkConf().setAll([('spark.jars', 'myUDFs-assembly-0.1.1.jar')]) spark = SparkSession.builder.appName('my_app').config(conf = spark_conf).enableHiveSupport().getOrCreate() spark.udf.registerJavaFunction("my_udf", "myUDFs.my_udf", IntType()) spark.sql('select my_udf(1, 2)').collect()
But it seems this is currently impossible.
It seems like the only current work around is to define two UDFs, one with the default pre-specified; the other with flexible parameters.