Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-22913

Support Python UDF chaining in Python DataStream API

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Done
    • None
    • 1.14.0
    • API / Python
    • None
    • Hide
      The job graph of Python DataStream API jobs may be different from before as the Python functions will be chained as much as possible to optimize the performance. You could disable Python functions chaining by setting 'python.operator-chaining.enabled' as 'false' explicitly.
      Show
      The job graph of Python DataStream API jobs may be different from before as the Python functions will be chained as much as possible to optimize the performance. You could disable Python functions chaining by setting 'python.operator-chaining.enabled' as 'false' explicitly.

    Description

      Currently, for the following job:

      ds = ..
      ds.map(map_func1)
          .map(map_func2)
      

      The Python function `map_func1` and `map_func2` will runs in separate Python workers and the result of `map_func1` will be transferred to JVM and then transferred to `map_func2` which may resides in another Python worker. This introduces redundant communication and serialization/deserialization overhead.

      Attachments

        Activity

          People

            dian.fu Dian Fu
            dian.fu Dian Fu
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: