Description
The discussion is based on how to support tuple and/or time based window operators in Samza physical operator layer.
Here are the few observations:
- Tuple represents the “physical ordering” of events while time-based window has semantic meanings to users
- Total ordering between tuples are possible within Samza/Kafka given a deterministic MessageSelector on all input streams and offsets within each stream
- No matter whether tuple or time is used to measure the window size, the window termination condition is needed to close a window to avoid the job to be wedged forever
The following questions have to be answered to fully implement a window operator:
- how to determine that a window is closed and no new tuples will be added?
- For tuple based, how do we close the window if messages do not come or get delayed?
- For time based, how do we close the window if
- the messages are not strictly in order w/ the time?
- the message w/ timestamp greater than the window boundary does not come or gets delayed?
Attachments
Attachments
1.
|
Implement window metadata store | In Progress | Yi Pan | |
2.
|
Implement message store for window operator | Open | Unassigned | |
3.
|
A time-based window operator for join | Open | Unassigned | |
4.
|
Stream-to-stream join operator w/ time-based window | Open | Guozhang Wang |