Details
-
Bug
-
Status: Open
-
Not a Priority
-
Resolution: Unresolved
-
1.4.0, 1.5.0, 1.6.0
-
None
Description
Tasks can fail with a PartitionNotFoundException if the deployment of the producer takes too long. More specifically, if it takes longer than the taskmanager.network.request-backoff.max, then the Task will give up and fail.
The problem is that we calculate the InputGateDeploymentDescriptor for a consuming task once the producer has been assigned a slot but we do not wait until it is actually running. The problem should be fixed if we wait until the task is in state RUNNING before assigning the result partition to the consumer.