Description
Sometimes the connector will need to indicate to the framework that an error occurred, but the error could have multiple responses by the framework.
For source connectors, there's not much they need to indicate since they can block indefinitely. They probably only need to indicate permanent errors for correctness, though we may want them to indicate transient errors so we can report health of the task in a metric.
For sink connectors, there are at least a couple of scenarios:
1. A task encounters some error while processing a put(records) call and was unable to fully process it, but thinks it could be resolved in the future. The task doesn't want to see any new records until the issue is resolved, but will need to see the same set of records again. (It would be nice if the task doesn't have to deal with saving these to a buffer itself.)
2. A task encounters some error while processing data, but it has enqueued/handled the data passed into the put(records) call. For example, it may have passed it to some library which buffers it, but then the library indicated that it is having some connection issues. The connector might be able accept more data, but the task is not in a healthy state.
3. The task encounters some error that it decides is unrecoverable. This might just be transient errors that repeat for long enough that the task thinks its time to give up. Unclear what to do here, but one option is relocating the task to another worker, hoping that the issue is specific to the worker.
Note that it is not, generally, safe for sink tasks to do their own backoff or we'd potentially starve the consumer, which needs to poll() in order to heartbeat. So we need to make sure whatever mechanism we implement encourages the user to throw an exception and pass control back to us instead.
Attachments
Issue Links
- links to