Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-32027

Batch jobs could hang at shuffle phase when max parallelism is really large

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    Description

      In batch stream mode with adaptive batch schedule mode, If we set the max parallelism large as 32768 (pipeline.max-parallelism), the job could hang at the shuffle phase:

      It would hang for a long time and show "No bytes sent":

      After some time to debug, we can see the downstream operator did not receive the end-of-partition event.

      Attachments

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            Weijie Guo Weijie Guo
            yunta Yun Tang
            Votes:
            0 Vote for this issue
            Watchers:
            9 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment