Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-9413

Tasks can fail with PartitionNotFoundException if consumer deployment takes too long

    Details

      Description

      Tasks can fail with a PartitionNotFoundException if the deployment of the producer takes too long. More specifically, if it takes longer than the taskmanager.network.request-backoff.max, then the Task will give up and fail.

      The problem is that we calculate the InputGateDeploymentDescriptor for a consuming task once the producer has been assigned a slot but we do not wait until it is actually running. The problem should be fixed if we wait until the task is in state RUNNING before assigning the result partition to the consumer.

        Attachments

          Activity

            People

            • Assignee:
              mingleizhang zhangminglei
              Reporter:
              till.rohrmann Till Rohrmann
            • Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated: