A number of component support resuming operation from a given point in time or index but each component implement its own logic and it is currently not possible to use an external system to store such information.
Because of that, even if the source support resuming from a point, we sometime still need to use an idempotent consumer which is ok but inefficient because the consumer may need to re-process events from the source as it lacks a way to retrieve from where it left.
We should provide a generic mechanism to implement such logic so that:
- it should be possible to restart a camel application without having to re-process events from the source even without the need of an idempotent repository (for sources that support resuming)
- in case of sources that support partitioning, it should be possible to consume data from an elastic pool of camel consumers as the information about the consumer state is stored in an external repository so the system can rebalance the load anc new consumer can start where others have left
This repository is similar to the IndempotentRepository except it should allow to query the status of the entries.
A simple strategy that automatically update the InProgressRepository according to some headers like:
- CamelInProgressID: the key of the event
- CamelInProgressValue: the value (opaque)
- CamelInProgressStatus: the status of the consumer for the given key (i.e. could be InProgress when the consumer is processing a file and become Done when the file is fully processed)
We should also implement a mechanism to be able to consistently determine the "partition key" according to the active consumers.