Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-34050

Rocksdb state has space amplification after rescaling with DeleteRange

    XMLWordPrintableJSON

Details

    Description

      FLINK-21321 use deleteRange to speed up rocksdb rescaling, however it will cause space amplification in some case.

      We can reproduce this problem using wordCount job:

      1) before rescaling, state operator in wordCount job has 2 parallelism and 4G+ full checkpoint size;

      2) then restart job with 4 parallelism (for state operator),  the full checkpoint size of new job will be 8G+ ;

      3) after many successful checkpoints, the full checkpoint size is still 8G+;

       

      The root cause of this issue is that the deleted keyGroupRange does not overlap with current DB keyGroupRange, so new data written into rocksdb after rescaling almost never do LSM compaction with the deleted data (belonging to other keyGroupRange.)

       

      And the space amplification may affect Rocksdb read performance and disk space usage after rescaling. It looks like a regression due to the introduction of deleteRange for rescaling optimization.

       

      To slove this problem, I think maybe we can invoke Rocksdb.deleteFilesInRanges after deleteRange?

      public static void clipDBWithKeyGroupRange() {
        //.......
        List<byte[]> ranges = new ArrayList<>();
        //.......
        deleteRange(db, columnFamilyHandles, beginKeyGroupBytes, endKeyGroupBytes);
        ranges.add(beginKeyGroupBytes);
        ranges.add(endKeyGroupBytes);
        //....
      
        for (ColumnFamilyHandle columnFamilyHandle : columnFamilyHandles) {
           db.deleteFilesInRanges(columnFamilyHandle, ranges, false);
        }
      }
      
      
      

       

      Attachments

        1. image-2024-01-10-21-23-48-134.png
          908 kB
          Jinzhong Li
        2. image-2024-01-10-21-24-10-983.png
          128 kB
          Jinzhong Li
        3. image-2024-01-10-21-28-24-312.png
          1.68 MB
          Jinzhong Li

        Issue Links

          Activity

            People

              lijinzhong Jinzhong Li
              lijinzhong Jinzhong Li
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: