Details
-
New Feature
-
Status: Closed
-
Critical
-
Resolution: Implemented
-
None
-
None
Description
[Design doc]:
https://docs.google.com/document/d/18XlGPcfsGbnPSApRipJDLPg5IFNGTQjnz7emkVpZlkw
[Introduction]:
"Retraction" is an important building block for data streaming to refine the early fired results in streaming. “Early firing” are very common and widely used in many streaming scenarios, for instance “window-less” or unbounded aggregate and stream-stream inner join, windowed (with early firing) aggregate and stream-stream inner join. There are mainly two cases that require retractions: 1) update on the keyed table (the key is either a primaryKey (PK) on source table, or a groupKey/partitionKey in an aggregate); 2) When dynamic windows (e.g., session window) are in use, the new value may be replacing more than one previous window due to window merging.
To the best of our knowledge, the retraction for the early fired streaming results has never been practically solved before. In this proposal, we develop a retraction solution and explain how it works for the problem of “update on the keyed table”. The same solution can be easily extended for the dynamic windows merging, as the key component of retraction - how to refine an early fired results - is the same across different problems.
[Proposed Jiras]:
Implement decoration phase for rewriting predicated logical plan after volcano optimization phase
Implement optimizer for retraction and turn on retraction for over window aggregate
Implement and turn on the retraction for grouping window aggregate
Implement and turn on retraction for table source
Implement and turn on retraction for table sink
Implement and turn on retraction for stream-stream inner join
Implement the retraction for the early firing window
Implement the retraction for the dynamic window with early firing