Uploaded image for project: 'Jackrabbit Oak'
  1. Jackrabbit Oak
  2. OAK-2140

Segment Compactor will not compact binaries > 16k

    XMLWordPrintableJSON

Details

    Description

      The compaction bit rely on the SegmentBlob#clone method in the case a binary is being processed but it looks like the #clone contract is not fully enforced for streams that are qualified as 'long values' (>16k if I read the code correctly).
      What happens is the stream is initially persisted as chunks in a ListRecord. When compaction calls #clone it will get back the original list of record ids, which will get referenced from the compacted node state [0], making compaction on large binaries ineffective as the bulk segments will never move from the original location where they were created, unless the reference node gets deleted.

      I think the original design was setup to prevent large binaries from being copied over but looking at the size problem we have now it might be a good time to reconsider this approach.

      [0] https://github.com/apache/jackrabbit-oak/blob/trunk/oak-core/src/main/java/org/apache/jackrabbit/oak/plugins/segment/SegmentBlob.java#L75

      Attachments

        1. OAK-2140.patch
          1 kB
          Michael Dürig

        Issue Links

          Activity

            People

              stillalex Alex Deparvu
              stillalex Alex Deparvu
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: