Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-35815

KinesisProxySyncV2 doesn't always retry throttling exceptions.

    XMLWordPrintableJSON

Details

    Description

      Problem:

      We have observed missing retrys on throttling for DescribeStreamSummary calls from Kinesis.

      org.apache.flink.runtime.rest.handler.RestHandlerException: Could not execute application.
      ...
      Caused by: java.util.concurrent.CompletionException: org.apache.flink.util.FlinkRuntimeException: Could not execute application.
      ...
      Caused by: org.apache.flink.util.FlinkRuntimeException: Could not execute application.
      ... 
      Caused by: org.apache.flink.client.program.ProgramInvocationException: The main method caused an error: Error registering stream:  <edited>
      ...
      Caused by: org.apache.flink.streaming.connectors.kinesis.util.StreamConsumerRegistrarUtil$FlinkKinesisStreamConsumerRegistrarException: Error registering stream: <edited>
      ...
      Caused by: org.apache.flink.kinesis.shaded.software.amazon.awssdk.services.kinesis.model.LimitExceededException: Rate exceeded for stream  <edited>. (Service: Kinesis, Status Code: 400, Request ID: efe4c2a9-3c3b-9c1c-b0ed-d9b05db93be2, Extended Request ID: pSG6kwQXgPWD2S7YoPT4RKf+g8QbRBaxc0grhNz6juEoti/uGUQTzyqsfmFCLSHoM+u1ydHvqxzsv/0ICUid6aTAQdndy2EO)
      ...
          at org.apache.flink.streaming.connectors.kinesis.proxy.KinesisProxySyncV2.lambda$describeStreamSummary$0(KinesisProxySyncV2.java:91)
          at org.apache.flink.streaming.connectors.kinesis.proxy.KinesisProxySyncV2.invokeWithRetryAndBackoff(KinesisProxySyncV2.java:175)
          at org.apache.flink.streaming.connectors.kinesis.proxy.KinesisProxySyncV2.describeStreamSummary(KinesisProxySyncV2.java:90)
          at org.apache.flink.streaming.connectors.kinesis.internals.publisher.fanout.StreamConsumerRegistrar.registerStreamConsumer(StreamConsumerRegistrar.java:92)
          at org.apache.flink.streaming.connectors.kinesis.util.StreamConsumerRegistrarUtil.registerStreamConsumers(StreamConsumerRegistrarUtil.java:121)
      ... 

      The same problem occurs both with LAZY and EAGER registration strategies.

       

      Why does it get stuck?

      https://github.com/apache/flink-connector-aws/blob/c716ca439b2c8e6d4b5905a03c867c418e031688/flink-connector-aws/flink-connector-kinesis/src/main/java/org/apache/flink/streaming/connectors/kinesis/util/AwsV2Util.java#L77

      The `isRecoverableException` check validates the cause of the exception only, but it doesn't inspect the actual exception being evaluated for retriability. In this particular case, LimitExceededException is thrown without wrappers and it appears that this case is not handled correctly.

      Attachments

        Issue Links

          Activity

            People

              a.pilipenko Aleksandr Pilipenko
              kdziolak Krzysztof Dziolak
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: