Uploaded image for project: 'Kafka'
  1. Kafka
  2. KAFKA-2365 Copycat checklist
  3. KAFKA-2479

Add CopycatExceptions to indicate transient and permanent errors in a connector/task

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 0.10.0.0
    • connect
    • None

    Description

      Sometimes the connector will need to indicate to the framework that an error occurred, but the error could have multiple responses by the framework.

      For source connectors, there's not much they need to indicate since they can block indefinitely. They probably only need to indicate permanent errors for correctness, though we may want them to indicate transient errors so we can report health of the task in a metric.

      For sink connectors, there are at least a couple of scenarios:
      1. A task encounters some error while processing a put(records) call and was unable to fully process it, but thinks it could be resolved in the future. The task doesn't want to see any new records until the issue is resolved, but will need to see the same set of records again. (It would be nice if the task doesn't have to deal with saving these to a buffer itself.)
      2. A task encounters some error while processing data, but it has enqueued/handled the data passed into the put(records) call. For example, it may have passed it to some library which buffers it, but then the library indicated that it is having some connection issues. The connector might be able accept more data, but the task is not in a healthy state.
      3. The task encounters some error that it decides is unrecoverable. This might just be transient errors that repeat for long enough that the task thinks its time to give up. Unclear what to do here, but one option is relocating the task to another worker, hoping that the issue is specific to the worker.

      Note that it is not, generally, safe for sink tasks to do their own backoff or we'd potentially starve the consumer, which needs to poll() in order to heartbeat. So we need to make sure whatever mechanism we implement encourages the user to throw an exception and pass control back to us instead.

      Attachments

        Issue Links

          Activity

            People

              liquanpei Liquan Pei
              ewencp Ewen Cheslack-Postava
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: