Uploaded image for project: 'Jackrabbit Oak'
  1. Jackrabbit Oak
  2. OAK-1907

Better cost estimates for traversal, property, and ordered indexes

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 1.0, 1.0.1, 1.0.2
    • 1.1.3
    • query
    • None

    Description

      Currently, cost estimates of traversal, property index, and ordered index don't take the number of nodes into account, if there are more than about 100 nodes. This is problematic because in many cases, the wrong index is used (because of incorrect cost estimate).

      To get a better estimate, a very rough estimate on the number of child nodes below a given path is needed.

      One idea is: when adding a node, if Math.random() < 0.00001, add a hidden, randomly named property (for example called ":count-xyz" where xyz is a uuid, value 100'000) to the parents of that node, so that we know there are probably more than 100'000 nodes below a given path. When removing a node, with the same algorithm add a hidden property (":count-xyz", value -100'000). That should result in a slowdown of less than 0.01%, but should allow us much better cost estimates. Those properties could be consolidated asynchronously if needed.

      Attachments

        1. probability_50t_1m.png
          90 kB
          Thomas Mueller
        2. OAK-1907.diff
          25 kB
          Davide Giannella
        3. ApproxCount.java
          3 kB
          Thomas Mueller

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            thomasm Thomas Mueller
            thomasm Thomas Mueller
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment