[FLINK-23802] [kinesis][efo] Reduce ReadTimeoutExceptions for Kinesis Consumer - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 1.12.0, 1.12.1, 1.12.2, 1.13.0, 1.12.3, 1.13.1, 1.12.4, 1.12.5, 1.13.2
Fix Version/s: 1.14.0, 1.13.3, 1.12.8
Component/s: Connectors / Kinesis
Labels:
- pull-request-available

Description

Background

The Kinesis EFO consumer uses an async AWS SDK Netty client to read records from Kinesis. When the client is inactive for 30 seconds a ReadTimeoutException is thrown by Netty. The consumer will terminate the subscription, backoff and retry. Jobs with high backpressure can result in frequent ReadTImeoutException and the frequent backoff and retry can cause unnecessary overhead.

What?

Reduce/eliminate ReadTimeoutException from the EFO consumer

How?

There are 2 improvements to be made:
1. Request next record from the Flink source thread rather than the AWS SDK response thread. This means that there will always be space in the input buffer queue. The AWS SDK async response thread is no longer blocking on this queue. Backpressure is now applied by the Flink source thread rather than the AWS SDK thread.
2. Increase the Read Timeout (30s) to be higher than the maximum Shard subscription duration (5m) and enable TCP keep alive

References

This has already been implemented and tested in amazon-kinesis-connector-flink:

Attachments

Issue Links

links to

GitHub Pull Request #16839

GitHub Pull Request #16840

GitHub Pull Request #16841

Activity

People

Assignee:: Danny Cranmer

Reporter:: Danny Cranmer

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 16/Aug/21 08:07

Updated:: 15/Dec/21 01:40

Resolved:: 17/Aug/21 11:59