Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-26548

the source parallelism is not set correctly with AdaptiveBatchScheduler

    XMLWordPrintableJSON

Details

    Description

      When running org.apache.flink.table.tpcds.TpcdsTestProgram with AdaptiveBatchScheduler, I ran into a problem:the num of records sent by the source operator is always 1, and the parallelism of source operator is also 1 even I set jobmanager.adaptive-batch-scheduler.default-source-parallelism to 8.

      After some research, I found that the operator A is not the actual file reader, it just splits files and assigns splits to downstream tasks for further processing, and the operator B is the actual file reader task. Here, the parallelism of operator B is 64, and the records sent by operator A is 1, this means, operator A assigned all splits to a task of operator B, the other 63 tasks of operator B is idle, it is unreasonable.

      In this case,  the parallelism of operator B should be jobmanager.adaptive-batch-scheduler.default-source-parallelism  and the num of records sent by operator A also should be jobmanager.adaptive-batch-scheduler.default-source-parallelism.

       

      Attachments

        Issue Links

          Activity

            People

              Leo Zhou zhouli
              Leo Zhou zhouli
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated: