Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
1.4.2, 1.5.2, 1.6.0
Description
The DataSet API aims to chain generated operators such as key extraction mappers to their predecessor. This is done by assigning the same parallelism as the input operator.
If a generated operator has more than two inputs, the operator cannot be chained anymore and the operator is generated with default parallelism. This can lead to a
NoResourceAvailableException: Not enough free slots available to run the job.
as reported by a user on the mailing list: https://lists.apache.org/thread.html/60a8bffcce54717b6273bf3de0f43f1940fbb711590f4b90cd666c9a@%3Cuser.flink.apache.org%3E
I suggest to set the parallelism of a generated operator to the max parallelism of all of its inputs to fix this problem.
Until the problem is fixed, a workaround is to set the default parallelism at the ExecutionEnvironment:
ExecutionEnvironment env = ... env.setParallelism(2);