Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-18235

Improve the checkpoint strategy for Python UDF execution

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: 1.12.0
    • Component/s: API / Python
    • Labels:
      None

      Description

      Currently, when a checkpoint is triggered for the Python operator, all the data buffered will be flushed to the Python worker to be processed. This will increase the overall checkpoint time in case there are a lot of elements buffered and Python UDF is slow. We should improve the checkpoint strategy to improve this, e.g. buffering the data into state instead of flushing them out. We can also let users to config the checkpoint strategy if needed.

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              dian.fu Dian Fu
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated: