Uploaded image for project: 'Jackrabbit Oak'
  1. Jackrabbit Oak
  2. OAK-2284

Better locality for blobs collections over sharding

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 1.1.2
    • Fix Version/s: None
    • Component/s: blob, mongomk

      Description

      Currently when use oak with mongo mk for blobs storage we can easily end up with all different chunks of a binary streams scattered across the shards.

      Now this is not ideal since it generates a large number of scattered gather queries over the shards for each individual files.

      To allow better locality I propose the addition of another field called _anchor
      This anchor field will be generated by the inverse order of the current timestamp of beginning of the storage of the file:

      //Milliseconds Second Minute HH
      SimpleDateFormat sdf = new SimpleDateFormat("SSSssmmHH");
      //store the parsed integer of this value for more storage efficiency
      String a = asdf.format(new Date());
      int _anchor = Integer.parseInt(asdf.format(new Date()));
      

      This new _anchor field should be part of the shard key which also requires to be indexed along side with _id

      Pull request is on the making!

      N.

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              nleite Norberto Leite
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:

                Time Tracking

                Estimated:
                Original Estimate - 96h
                96h
                Remaining:
                Remaining Estimate - 96h
                96h
                Logged:
                Time Spent - Not Specified
                Not Specified