Details
-
Improvement
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
-
None
Description
The original discussions are at https://github.com/gearpump/gearpump/issues/1528 and https://github.com/gearpump/gearpump/issues/354.
When a message flows through a stream processing system, the system will try to provide some guarantee on message delivery From the weakest to strongest, there are.
- At most once delivery
a message is processed zero or one times. Messages can be lost.
- At least once delivery
a message is processed one or more times such that at least one of them succeeds. Messages can not be lost but can be duplicated.
- Exactly once delivery
a message is processed exactly once. Messages can neither be lost nor duplicated.
Gearpump tracks message loss between a sender Task and a receiver Task and replays the application on message loss. If the source is TimeReplayable, then at-least-once delivery can be guaranteed. In addition, if user state is stored through PersistentState API, then exactly-once delivery is guaranteed. Otherwise, at-most-once delivery is guaranteed.
There are several limitations with the current implementation.
- If users only require at-most-once delivery, message loss track is not necessary and we may get better performance without it.
- We require user's data source to be TimeReplayable for at-least-once/exactly-once delivery. It would be better if we provide a TimeReplayable wrapper when user source is not replayable (e.g. Twitter)
- Further, it would be nice if we allow users to switch between the different guarantees through APIs or dashboard.
This jira is to gather requirements and ideas from the community and users. The real work will be divided into subtasks and committed step by step.