Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-5792

Improve “UDF/UDTF" to support constructor with parameter.

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.3.0
    • Component/s: Table API & SQL
    • Labels:
      None

      Description

      Currently UDF/UDTF in the codegen phase using a nonparametric constructor to create the instance, causing the user can not include the state value in the UDF/UDTF. The UDF/UDTF's codegen phase can use a serialized mechanism so that the UDTF can contain state values.
      1. UserDefinedFunction inherits Serializable.
      2. Modify CodeGenerator about UDF/UDTF part.
      3. Modify TableAPI about UDF/UDTF
      4. Add Test.

        Activity

        Hide
        clarkyzl Zhuoluo Yang added a comment -

        sunjincheng Could you please attach some design documents on this ticket?

        Show
        clarkyzl Zhuoluo Yang added a comment - sunjincheng Could you please attach some design documents on this ticket?
        Hide
        jark Jark Wu added a comment -

        +1

        The serialization way will be more flexible.

        Show
        jark Jark Wu added a comment - +1 The serialization way will be more flexible.
        Hide
        fhueske Fabian Hueske added a comment -

        I agree, it makes sense to ship a serialized UDF object.
        Regular Flink function are also serialized and distributed to the workers.

        sunjincheng can you explain a bit how you want to distribute the UDF?
        I see two options:
        1. make the UDF a member of wrapping function. It might be a bit tricky to pass the reference into the code-gen'd function.
        2. add a final byte[] field into the code-gen'd function that holds the serialized UDF object and deserialize during initialization. This will blow up the code-gen'd string but might work well.

        Best, Fabian

        Show
        fhueske Fabian Hueske added a comment - I agree, it makes sense to ship a serialized UDF object. Regular Flink function are also serialized and distributed to the workers. sunjincheng can you explain a bit how you want to distribute the UDF? I see two options: 1. make the UDF a member of wrapping function. It might be a bit tricky to pass the reference into the code-gen'd function. 2. add a final byte[] field into the code-gen'd function that holds the serialized UDF object and deserialize during initialization. This will blow up the code-gen'd string but might work well. Best, Fabian
        Hide
        jark Jark Wu added a comment -

        Hi Fabian Hueske, sunjincheng has create a PR for this issue: https://github.com/apache/flink/pull/3330

        Sun's way is similar to approach-2, which serialized the UDF object into a string. Maybe we can move the discussion under the PR.

        Show
        jark Jark Wu added a comment - Hi Fabian Hueske , sunjincheng has create a PR for this issue: https://github.com/apache/flink/pull/3330 Sun's way is similar to approach-2, which serialized the UDF object into a string. Maybe we can move the discussion under the PR.
        Hide
        fhueske Fabian Hueske added a comment -

        Ah, great. Thanks for pointing to the PR Jark Wu.

        Show
        fhueske Fabian Hueske added a comment - Ah, great. Thanks for pointing to the PR Jark Wu .
        Hide
        sunjincheng121 sunjincheng added a comment -

        HI, Fabian Hueske Thanks for your attention to this JIRA. Thanks Jark Wu review the PR.
        I had serialize UDF object at code-gen stage and deserialize in the `open()` method. JackWu is right. the current implement is similar to the approach-2. What do you think about the current PR?

        Show
        sunjincheng121 sunjincheng added a comment - HI, Fabian Hueske Thanks for your attention to this JIRA. Thanks Jark Wu review the PR. I had serialize UDF object at code-gen stage and deserialize in the `open()` method. JackWu is right. the current implement is similar to the approach-2. What do you think about the current PR?
        Hide
        sunjincheng121 sunjincheng added a comment -

        BYW, when I implement the issue, I had try to use `kryo` serialize&deserialize the UDTF/UDF object, It will make byte [] very small.
        But unfortunately, It asked the serialized member must have a zero-parameter constructor which not friendly to user.

        Show
        sunjincheng121 sunjincheng added a comment - BYW, when I implement the issue, I had try to use `kryo` serialize&deserialize the UDTF/UDF object, It will make byte [] very small. But unfortunately, It asked the serialized member must have a zero-parameter constructor which not friendly to user.
        Hide
        clarkyzl Zhuoluo Yang added a comment -

        Hi sunjincheng, I think this feature is very important to FLINK-5802. Because we need to pass something (eg, Hive Udf ) to the Flink's UDF/UDTF. A serialization will be a good idea, IMHO.

        Show
        clarkyzl Zhuoluo Yang added a comment - Hi sunjincheng , I think this feature is very important to FLINK-5802 . Because we need to pass something (eg, Hive Udf ) to the Flink's UDF/UDTF. A serialization will be a good idea, IMHO.
        Hide
        sunjincheng121 sunjincheng added a comment -

        HI,Zhuoluo Yang Thanks for your attention to this JIRA. After https://github.com/apache/flink/pull/3330 merge into master, I'll open the PR of FLINK-5794.

        Show
        sunjincheng121 sunjincheng added a comment - HI, Zhuoluo Yang Thanks for your attention to this JIRA. After https://github.com/apache/flink/pull/3330 merge into master, I'll open the PR of FLINK-5794 .
        Hide
        twalthr Timo Walther added a comment -

        All subtasks have been implemented. I will resolve this issue.

        Show
        twalthr Timo Walther added a comment - All subtasks have been implemented. I will resolve this issue.

          People

          • Assignee:
            sunjincheng121 sunjincheng
            Reporter:
            sunjincheng121 sunjincheng
          • Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development