Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-24607

SourceCoordinator may miss to close SplitEnumerator when failover frequently

    XMLWordPrintableJSON

Details

    Description

      We are having a connection leak problem when using mysql-cdc [1] source. We observed that many enumerators are not closed from the JM log.

      ➜  test123 cat jobmanager.log | grep "SourceCoordinator \[\] - Restoring SplitEnumerator" | wc -l
           264
      ➜  test123 cat jobmanager.log | grep "SourceCoordinator \[\] - Starting split enumerator" | wc -l
           264
      ➜  test123 cat jobmanager.log | grep "MySqlSourceEnumerator \[\] - Starting enumerator" | wc -l
           263
      ➜  test123 cat jobmanager.log | grep "SourceCoordinator \[\] - Closing SourceCoordinator" | wc -l
           264
      ➜  test123 cat jobmanager.log | grep "MySqlSourceEnumerator \[\] - Closing enumerator" | wc -l
           195
      

      We added "Closing enumerator" log in MySqlSourceEnumerator#close(), and "Starting enumerator" in MySqlSourceEnumerator#start(). From the above result you can see that SourceCoordinator is restored and closed 264 times, split enumerator is started 264 but only closed 195 times. It seems that SourceCoordinator misses to close enumerator when job failover frequently.

      I also went throught the code of SourceCoordinator and found some suspicious point:

      The started flag and enumerator is assigned in the main thread, however SourceCoordinator#close() is executed async by DeferrableCoordinator#closeAsync. That means the close method will check the started and enumerator variable async. Is there any concurrency problem here which mean lead to dirty read and miss to close the enumerator?

      I'm still not sure, because it's hard to reproduce locally, and we can't deploy a custom flink version to production env.

      [1]: https://github.com/ververica/flink-cdc-connectors

      Attachments

        1. jobmanager.log
          4.66 MB
          Jark Wu

        Issue Links

          Activity

            People

              becket_qin Jiangjie Qin
              jark Jark Wu
              Votes:
              0 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: