Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-36081

Flink CDC MySQL source connector missing some columns data of newly added tables

    XMLWordPrintableJSON

Details

    Description

      Problem Description:

      When adding a new table, the Flink CDC MySQL source connector experiences missing data for some columns of the newly added table.

      Reproduction Scenario:

      1. Remove a table from a cdc job that is running normally, then start the job with resume functionality.
      2. Perform a column addition operation on the removed table.
      3. Add the table back to the job. The job continues to run without interruption upon table addition, but data for the newly added columns is missing in the synchronized data.

      Cause Analysis:

      The issue arises because the MySQL CDC Source maintains the table schema in state. When adding a new table, it recovers the schema from the previous state. Since the prior schema exists and represents the structure before the column addition, the MySQL CDC Source provides the downstream with data based on the schema cached in the state. Consequently, records outputted to downstream systems are missing the fields corresponding to the newly added columns.

      Proposed Solution:

      Upon removing a table from the cdc job, it is necessary to also correspondingly remove the table from the MySQLBinlogSplit.

      Attachments

        Issue Links

          Activity

            People

              kevinwang Mingya Wang
              kevinwang Mingya Wang
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: