[KAFKA-16931] Transient REST failures to forward fenceZombie requests leave Connect Tasks in FAILED state - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: None
Fix Version/s: None
Component/s: connect
Labels:
None

Description

When Kafka Connect runs in exactly_once mode, a task restart will fence possible zombies tasks.

This is achieved forwarding the request to the leader worker using the REST protocol.

At scale, in distributed mode, occasionally an HTTPs request may fail because of a networking glitch, reconfiguration etc

Currently there is no attempt to retry the REST request, the task is left in a FAILED state and requires an external restart (with the REST API).

Would this issue require a small KIP to introduce configuration entries to limit the number of retries, backoff times etc ?

Attachments

Activity

People

Assignee:: Unassigned

Reporter:: Edoardo Comar

Votes:: 1 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 11/Jun/24 13:52

Updated:: 11/Jun/24 16:23