Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-37065

MySQL cdc can lose/skip data during recovering from the checkpoint

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • cdc-3.2.0, cdc-3.2.1
    • None
    • None

    Description

      During each pipeline start (e.g., failover or restart), the Flink CDC connector retrieves the current GTID sets from the MySQL server and merges it with the pipeline's current state. This merged GTID set is then sent back to the MySQL server to indicate which transactions the Flink pipeline has already processed, ensuring that the server doesn’t resend processed transactions.

       

      Flink CDC MySQL Connector uses the fixRestoredGtidSet method to merge the GTID sets from the server with the GTID sets from the checkpoint. The method ensures that Flink will "tell" MySQL to skip over transactions it has already processed, avoiding duplication. However, the current implementation of this method doesn’t handle gaps caused by MySQL parallel execution. For example, if the restored GTID set is 1-80:83-90:92-98 and the server GTID set is 1-100, the method will merge gaps together and result will be 1-98, since it selects the highest gtid from checkpoint

      So in case if the pipeline has been restarted during any “gaps”, Flink CDC won’t process “gapped” transactions and will lose the data

      Attachments

        Issue Links

          Activity

            There are no comments yet on this issue.

            People

              Unassigned Unassigned
              imielientiev Ihor Mielientiev
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated: