Details
Description
If a connector hangs during any of its initialize, start, stop, taskConfigs, taskClass, version, config, or validate methods, the worker will be disabled for some types of requests thereafter, including connector creation, connector reconfiguration, and connector deletion.
This only occurs in distributed mode and is due to the threading model used by the DistributedHerder class. This affects both distributed and standalone mode. Distributed herders perform some connector work synchronously in their tick thread, which also handles group membership and some REST requests. The majority of the herder methods for the standalone herder are synchronized, including those for creating, updating, and deleting connectors; as long as one of those methods blocks, all subsequent calls to any of these methods will also be blocked.
One potential solution could be to treat connectors that fail to start, stop, etc. in time similarly to tasks that fail to stop within the task graceful shutdown timeout period by handling all connector interactions on a separate thread, waiting for them to complete within a timeout, and abandoning the thread (and transitioning the connector to the FAILED state, if it has been created at all) if that timeout expires.
Attachments
Issue Links
- causes
-
KAFKA-12904 Connect's validate REST endpoint uses incorrect timeout
- Resolved
- is related to
-
KAFKA-15238 Connect workers can be disabled by DLQ-related blocking admin client calls
- Resolved
-
KAFKA-14670 Refactor connect plugins to be called from wrapper classes
- In Progress
-
KAFKA-9975 KIP-611: Improved Handling of Abandoned Connectors and Tasks
- Resolved
- links to