Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-18235

Improve the checkpoint strategy for Python UDF execution

    XMLWordPrintableJSON

Details

    Description

      Currently, when a checkpoint is triggered for the Python operator, all the data buffered will be flushed to the Python worker to be processed. This will increase the overall checkpoint time in case there are a lot of elements buffered and Python UDF is slow. We should improve the checkpoint strategy to improve this. One way to implement this is to control the number of data buffered in the pipeline between Java/Python processes, similar to what FLIP-183 does to control the number of data buffered in the network. We can also let users to config the checkpoint strategy if needed.

      Attachments

        Activity

          People

            Unassigned Unassigned
            dian.fu Dian Fu
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: