Details

    • Now user can set taskmanager.network.blocking-shuffle.compression.enabled to true to enable data compression of blocking shuffle.

    Description

      Currently, blocking shuffle writer writes raw output data to disk without compression. For IO bounded scenario, this can be optimized by compressing the output data. It is better to introduce a compression mechanism and offer users a config option to let the user decide whether to compress the shuffle data. Actually, we hava implemented compression in our inner Flink version and  here are some key points:

      1. Where to compress/decompress?

      Compressing at upstream and decompressing at downstream.

      2. Which thread do compress/decompress?

      Task threads do compress/decompress.

      3. Data compression granularity.

      Per buffer.

      4. How to handle that when data size become even bigger after compression?

      Give up compression in this case and introduce an extra flag to identify if the data was compressed, that is, the output may be a mixture of compressed and uncompressed data.

       

      We'd like to introduce blocking shuffle data compression to Flink if there are interests.

       

      Attachments

        Issue Links

          Activity

            People

              kevin.cyj Yingjie Cao
              kevin.cyj Yingjie Cao
              Votes:
              0 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 10m
                  10m