Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-5826

UDF/UDTF should support variable types and variable arguments

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.3.0
    • Component/s: None
    • Labels:
      None

      Description

      In some cases, UDF/UDTF should support variable types and variable arguments. Many UDF/UDTF developers wish to make the # of arguments and types flexible to users. They try to make their functions flexible.

      Thus, we should support the following styles of UDF/UDTFs.

      for example 1, in Java

      public class SimpleUDF extends ScalarFunction {
      	public int eval(Object... args) {
      		// do something
      	}
      }
      

      for example 2, in Scala

      class SimpleUDF extends ScalarFunction {
        def eval(args: Any*): Int = {
          // do something
        }
      }
      

      If we modify the code in UserDefinedFunctionUtils.getSignature() and make both signatures pass. The first example will work normally. However, the second example will raise an exception.

      Caused by: org.codehaus.commons.compiler.CompileException: Line 58, Column 0: No applicable constructor/method found for actual parameters "java.lang.String"; candidates are: "public java.lang.Object test.SimpleUDF.eval(scala.collection.Seq)"
        at org.codehaus.janino.UnitCompiler.compileError(UnitCompiler.java:11523) ~[janino-3.0.6.jar:?]
        at org.codehaus.janino.UnitCompiler.findMostSpecificIInvocable(UnitCompiler.java:8679) ~[janino-3.0.6.jar:?]
        at org.codehaus.janino.UnitCompiler.findIMethod(UnitCompiler.java:8539) ~[janino-3.0.6.jar:?]
      

      The reason is that Scala will do a sugary modification to the signature of the method. The mothod

       def eval(args: Any*)

      will become

      def eval(args: scala.collection.Seq<Any>)

      in the class file.

      The code generation has been done in Java. If we use java style

      eval(Object... args)

      to call the Scala method, it will raise the above exception.

      However, I can't always restrict users to use Java to write a UDF/UDTF. Any ideas in variable types and variable arguments of Scala UDF/UDTFs to prevent the compilation failure?

        Activity

        Hide
        twalthr Timo Walther added a comment -

        Both subtasks have been implemented.

        Show
        twalthr Timo Walther added a comment - Both subtasks have been implemented.
        Hide
        clarkyzl Zhuoluo Yang added a comment -

        We would like to split the task into two sub-tasks.
        1. support ScalarFunction
        2. support TableFunction

        Show
        clarkyzl Zhuoluo Yang added a comment - We would like to split the task into two sub-tasks. 1. support ScalarFunction 2. support TableFunction
        Hide
        clarkyzl Zhuoluo Yang added a comment -

        Thanks Jark Wu , I'd like to do some test on this. It seems a concise design. IMHO, a meaningful exception would also be useful to users if necessary. I will try this annotation.

        Show
        clarkyzl Zhuoluo Yang added a comment - Thanks Jark Wu , I'd like to do some test on this. It seems a concise design. IMHO, a meaningful exception would also be useful to users if necessary. I will try this annotation.
        Hide
        jark Jark Wu added a comment -

        The generated code is pure Java, we try to avoid add Scala relative code in it (such as Scala Seq).

        I find that if add a varargs annotation to the eval method in Scala. It will generate a Java version of the method and then we can handle it like a Java varargs method.

        class MyFunction {
          @annotation.varargs
          def eval(args: String*): Unit = {
          }
        }
        

        What about forcing to add the annotation when using varargs, and throw an exception to tell users how to add the annotation when no varargs annotation declared ?

        Show
        jark Jark Wu added a comment - The generated code is pure Java, we try to avoid add Scala relative code in it (such as Scala Seq ). I find that if add a varargs annotation to the eval method in Scala. It will generate a Java version of the method and then we can handle it like a Java varargs method. class MyFunction { @annotation.varargs def eval(args: String *): Unit = { } } What about forcing to add the annotation when using varargs, and throw an exception to tell users how to add the annotation when no varargs annotation declared ?
        Hide
        clarkyzl Zhuoluo Yang added a comment -

        Thanks Xiaowei Jiang. After some investigation on "CodeGenerator.scala". I think I can generate different codes for a Scala "eval()". I'd like to try to do some coding on this.

        Show
        clarkyzl Zhuoluo Yang added a comment - Thanks Xiaowei Jiang . After some investigation on "CodeGenerator.scala". I think I can generate different codes for a Scala "eval()". I'd like to try to do some coding on this.
        Hide
        jiangxw Xiaowei Jiang added a comment -

        You can look at the signature when you do codegen and generate different code

        Show
        jiangxw Xiaowei Jiang added a comment - You can look at the signature when you do codegen and generate different code
        Hide
        clarkyzl Zhuoluo Yang added a comment - - edited

        I have already made a small modification in

        UserDefinedFunctionUtils.getSignature()

        . The basic idea is that we let the "Object..." and "Any*" pass and return the corresponding signature. This modification works for Java only. The Scala will fail. Since the code generation is to generate a Java codes, There will be some problem call

         eval(Seq<Any>) 

        in generated Java. However, there will be no problem at all in calling

        eval(Object... args}

        in generated Java.

        Show
        clarkyzl Zhuoluo Yang added a comment - - edited I have already made a small modification in UserDefinedFunctionUtils.getSignature() . The basic idea is that we let the "Object..." and "Any*" pass and return the corresponding signature. This modification works for Java only. The Scala will fail. Since the code generation is to generate a Java codes, There will be some problem call eval(Seq<Any>) in generated Java. However, there will be no problem at all in calling eval( Object ... args} in generated Java.

          People

          • Assignee:
            clarkyzl Zhuoluo Yang
            Reporter:
            clarkyzl Zhuoluo Yang
          • Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development