Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-9289

Parallelism of generated operators should have max parallism of input

    Details

      Description

      The DataSet API aims to chain generated operators such as key extraction mappers to their predecessor. This is done by assigning the same parallelism as the input operator.

      If a generated operator has more than two inputs, the operator cannot be chained anymore and the operator is generated with default parallelism. This can lead to a

      NoResourceAvailableException: Not enough free slots available to run the job.

      as reported by a user on the mailing list: https://lists.apache.org/thread.html/60a8bffcce54717b6273bf3de0f43f1940fbb711590f4b90cd666c9a@%3Cuser.flink.apache.org%3E

      I suggest to set the parallelism of a generated operator to the max parallelism of all of its inputs to fix this problem.

      Until the problem is fixed, a workaround is to set the default parallelism at the ExecutionEnvironment:

      ExecutionEnvironment env = ...
      env.setParallelism(2);
      

        Attachments

          Activity

            People

            • Assignee:
              xccui Xingcan Cui
              Reporter:
              fhueske Fabian Hueske
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: