Uploaded image for project: 'Jackrabbit Oak'
  1. Jackrabbit Oak
  2. OAK-2896

Putting many elements into a map results in many small segments.

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    Description

      There is an issue with how the HAMT implementation (SegmentWriter.writeMap() interacts with the 256 segment references limit when putting many entries into the map: This limit gets regularly reached once the maps contains about 200k entries. At that points segments get prematurely flushed resulting in more segments, thus more references and thus even smaller segments. It is common for segments to be as small as 7k with a tar file containing up to 35k segments. This is problematic as at this point handling of the segment graph becomes expensive, both memory and CPU wise. I have seen persisted segment graphs as big as 35M where the usual size is a couple of ks.

      As the HAMT map is used for storing children of a node this might have an advert effect on nodes with many child nodes.

      The following code can be used to reproduce the issue:

      SegmentWriter writer = new SegmentWriter(segmentStore, getTracker(), V_11);
      MapRecord baseMap = null;
      
      for (;;) {
          Map<String, RecordId> map = newHashMap();
          for (int k = 0; k < 1000; k++) {
              RecordId stringId = writer.writeString(String.valueOf(rnd.nextLong()));
              map.put(String.valueOf(rnd.nextLong()), stringId);
          }
      
          Stopwatch w = Stopwatch.createStarted();
          baseMap = writer.writeMap(baseMap, map);
          System.out.println(baseMap.size() + " " + w.elapsed());
      }
      

      Attachments

        1. OAK-2896.png
          275 kB
          Michael Dürig
        2. OAK-2896.xlsx
          87 kB
          Michael Dürig
        3. size-dist.png
          158 kB
          Michael Dürig

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            mduerig Michael Dürig
            mduerig Michael Dürig
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment