Uploaded image for project: 'Kafka'
  1. Kafka
  2. KAFKA-16931

Transient REST failures to forward fenceZombie requests leave Connect Tasks in FAILED state

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • connect
    • None

    Description

      When Kafka Connect runs in exactly_once mode, a task restart will fence possible zombies tasks.

      This is achieved forwarding the request to the leader worker using the REST protocol.

      At scale, in distributed mode, occasionally an HTTPs request may fail because of a networking glitch, reconfiguration etc

      Currently there is no attempt to retry the REST request, the task is left in a FAILED state and requires an external restart (with the REST API).

      Would this issue require a small KIP to introduce configuration entries to  limit the number of retries, backoff times etc ?

       

      Attachments

        Activity

          People

            Unassigned Unassigned
            ecomar Edoardo Comar
            Votes:
            1 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: