Uploaded image for project: 'Jackrabbit Oak'
  1. Jackrabbit Oak
  2. OAK-7066

Active deletion blob list files can grow too large due to inlined blobs

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Blocker
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.7.14, 1.8.0
    • Component/s: lucene
    • Labels:
      None

      Description

      This is follow up from OAK-7052 where we noticed that deleted blob list files collected by active deletion logic can grow very large due to inlined blobs.

      One potential way (not sure how yet though) is to not actively delete inlined blobs.

      Here are some stats which might help us take a call (based on raw numbers collected at [0])

      file-name large_lines large_size small_lines small_size small_lines/total_lines small_size/total_size
      blobs-1512664032264.txt 245301 3310224358 173096 35473656 0.413712335413495 0.010602766852107
      blobs-1512698405656.txt 370373 4443957885 256775 52997864 0.409432861142824 0.011785275852845
      blobs-1512987450004.txt 660669 6214740439 461168 92017554 0.411082893504137 0.014590309966251
      blobs-1513130410963.txt 569083 5490965583 406756 80124598 0.416826956085994 0.014382211631264
      blobs-1513216819447.txt 69876 1413561892 46238 9221956 0.398212101899857 0.006481628262061

      [0]:
      file sizes

      repository/index/deleted-blobs$ ls -l blobs-151*
      -rw-r--r-- 1 root root 3369065620 Dec  8 01:59 blobs-1512664032264.txt
      -rw-r--r-- 1 root root 4532250073 Dec  9 01:59 blobs-1512698405656.txt
      -rw-r--r-- 1 root root 6370201955 Dec 13 01:59 blobs-1512987450004.txt
      -rw-r--r-- 1 root root 1916223582 Dec 13 11:52 blobs-1513130410963.txt
      

      number of entries

      repository/index/deleted-blobs$ wc -l blobs-151*
           418397 blobs-1512664032264.txt
           627148 blobs-1512698405656.txt
          1121837 blobs-1512987450004.txt
           308292 blobs-1513130410963.txt
          2475674 total
      

      number of entries and sizes split on threshold of 500 bytes of blob ids

      repository/index/deleted-blobs$ for i in blobs-151*;do echo $i;awk 'BEGIN {FS="|"} {len = length($1); if (len > 500) {large++; largeSize+=len} else {small++; smallSize+=len}} END {print large, largeSize, small, smallSize}' $i;done
      blobs-1512664032264.txt
      245301 3310224358 173096 35473656
      blobs-1512698405656.txt
      370373 4443957885 256775 52997864
      blobs-1512987450004.txt
      660669 6214740439 461168 92017554
      blobs-1513130410963.txt
      569083 5490965583 406756 80124598
      blobs-1513216819447.txt
      69876 1413561892 46238 9221956
      

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                catholicon Vikas Saurabh
                Reporter:
                catholicon Vikas Saurabh
              • Votes:
                0 Vote for this issue
                Watchers:
                4 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: