Details
-
Improvement
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
-
None
Description
Currently in StreamingJoinOperator (non-window) in case of OUTER JOIN the OuterJoinRecordStateView is used to store additional field - the number of associations for every record. This leads to store additional Tuple2 and Integer data for every record in outer state.
This functionality is used only for sending:
- -D[nullPaddingRecord] in case of first Accumulate record
- +I[nullPaddingRecord] in case of last Revoke record
The overhead of storing additional data and updating the counter for associations can be avoided by checking the input state for these events.
The proposed solution can be found here - https://github.com/rovboyko/flink/commit/1ca2f5bdfc2d44b99d180abb6a4dda123e49d423
According to the nexmark q20 test (changed to OUTER JOIN) it could increase the performance up to 20%:
- Before:
- After: