The DataSet API aims to chain generated operators such as key extraction mappers to their predecessor. This is done by assigning the same parallelism as the input operator.
If a generated operator has more than two inputs, the operator cannot be chained anymore and the operator is generated with default parallelism. This can lead to a
as reported by a user on the mailing list: https://lists.apache.org/thread.html/60a8bffcce54717b6273bf3de0f43f1940fbb711590f4b90cd666c9a@%3Cuser.flink.apache.org%3E
I suggest to set the parallelism of a generated operator to the max parallelism of all of its inputs to fix this problem.
Until the problem is fixed, a workaround is to set the default parallelism at the ExecutionEnvironment: