Details
-
Sub-task
-
Status: Closed
-
Major
-
Resolution: Fixed
-
1.5.0, 1.5.1, 1.5.2, 1.5.3, 1.6.0
Description
Currently the RecordWriter emits output into multi channels via ChannelSelector or broadcasts output to all channels directly. Each channel has a separate RecordSerializer for serializing outputs, that means the output will be serialized as many times as the number of selected channels.
As we know, data serialization is a high cost operation, so we can get good benefits by improving the serialization only once.
I would suggest the following changes for realizing it.
- Only one RecordSerializer is created in RecordWriter for all the channels.
- The output is serialized into the intermediate data buffer only once for different channels.
- The intermediate serialization results are copied into different {{BufferBuilder}}s for different channels.
An additional benefit by using a single serializer for all channels is that we get a potentially significant reduction on heap space overhead from fewer intermediate serialization buffers (only once we got over 5MiB, these buffers were pruned back to 128B!).
Attachments
Issue Links
- causes
-
FLINK-10537 Network throughput performance regression after broadcast changes
- Closed
- supercedes
-
FLINK-4893 Use single RecordSerializer per RecordWriter
- Closed
-
FLINK-5060 only serialise records once in RecordWriter#emit and RecordWriter#broadcastEmit
- Closed
- links to