Details
-
Task
-
Status: Resolved
-
Minor
-
Resolution: Fixed
-
None
-
None
-
Docs Required, Release Notes Required
Description
Currently, in CDC through Kafka applications, single timeout property (kafkaRequestTimeout) is used for all Kafka related operations instead of built-in timeouts of Kafka clients API (moreover, default value of 3 seconds does not correspond to Kafka clients defaults):
Client | Timeout | Default value, s |
---|---|---|
KafkaProducer | delivery.timeout.ms | 120 |
KafkaProducer | request.timeout.ms | 30 |
KafkaConsumer | default.api.timeout.ms | 60 |
KafkaConsumer | request.timeout.ms | 30 |
Table below describes places where kafkaRequestTimeout is explicitly specified as total operation timeout instead of using default timeouts:
CDC application | API | Default value |
---|---|---|
ignite-cdc.sh: IgniteToKafkaCdcStreamer | KafkaProducer#send | delivery.timeout.ms * |
kafka-to-ignite.sh: KafkaToIgniteCdcStreamerApplier | KafkaConsumer#commitSync | default.api.timeout.ms |
kafka-to-ignite.sh: KafkaToIgniteCdcStreamerApplier | KafkaConsumer#close | KafkaConsumer#DEFAULT_CLOSE_TIMEOUT_MS (30s) |
kafka-to-ignite.sh: KafkaToIgniteMetadataUpdater | KafkaConsumer#partitionsFor | default.api.timeout.ms |
kafka-to-ignite.sh: KafkaToIgniteMetadataUpdater | KafkaConsumer#endOffsets | request.timeout.ms |
* - waits for future during specified timeout (kafkaRequestTimeout), but future fails itself if delivery timeout exceeded.
Timeouts for KafkaConsumer
All above methods will fail with an exception, when specified timeout exceeds, thus, specified timeout should not be too low.
On the other hand, kafka-to-ignite.sh also invokes KafkaConsumer#poll with timeout kafkaRequestTimeout, which blocks until data will become available or specified timeout will expire [5]. So, #poll should be called quite often and we should not set too large timeout for it, otherwise, we can face with delays of replication, when some topic partitions have no new data. It is not desired behavior, because in this case some partitions will wait to be processed.
Kafka clients request retries
Each single request will be retried in case of request.timeout.ms exceeding [2, 4]. Behavior of retries is similar both for KafkaConsumer and KafkaProducer. Minimal amount of retries approximately equals to ratio of total operation timeout to request.timeout.ms. Total timeout is an explicitly specified argument of API method or default value (described in above tables).
It is obvious, that currently kafkaRequestTimeout have to be N times greater, than request.timeout.ms in order to make request retries possible, i.e. most of time we have to override default value of 3s in CDC configuration.
Conclusion
- It seems, that the better approach is to rely only on built-in kafka clients timeouts, because kafka clients have already provided connection reliability features. These timeouts should be configured according to Kafka documentation.
- kafkaRequestTimeout should be used only for KafkaConsumer#poll, default value of 3s can remain the same.
- As alternative to points 1,2 we can add separate timeout for KafkaConsumer#poll. Default timeouts for all other operations have to be increased.
PS
Described behaviour is actual for Kafka 2.7.
Links:
- https://kafka.apache.org/27/documentation.html#producerconfigs_delivery.timeout.ms
- https://kafka.apache.org/27/documentation.html#producerconfigs_request.timeout.ms
- https://kafka.apache.org/27/documentation.html#consumerconfigs_default.api.timeout.ms
- https://kafka.apache.org/27/documentation.html#consumerconfigs_request.timeout.ms
- https://kafka.apache.org/27/javadoc/org/apache/kafka/clients/consumer/KafkaConsumer.html#poll-java.time.Duration-
Attachments
Issue Links
- is required by
-
IGNITE-18574 CDC: add documentation about Kafka request timeouts
- Resolved
- links to