The Flink Kinesis EFO consumer has a SubscribeToShard retry policy which will terminate the job after a given number of subsequent attempt failures. In high backpressure scenarios the Netty HTTP Client throws a ReadTimeoutException when the consumer takes longer than 30s to process a batch. If this happens (by default) 10 times in a row, the job will terminate. There is no need to terminate in this condition, and the restart results in the job falling further behind.
Exclude the ReadTimeoutException from the SubscribeToShard retry policy, such that that connector will gracefully reconnect once the consumer has processed the queued records.