Uploaded image for project: 'Jackrabbit Oak'
  1. Jackrabbit Oak
  2. OAK-4323

Query engine: index cost formula incorrect when using "limit"

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • query
    • None

    Description

      As described in OAK-2081, the cost formula currently used in the query engine is not correct if "limit" is used, because it doesn't account for false positives.

      Example: Let's say there are two indexes:

      • color: 10000 nodes with color=red, but a bit slower (lets say a remote index), cost per entry is 1.5.
      • size: 20000 nodes with size=M, but a bit faster (lets say a local index), cost per entry is 1.

      Without limit, the index for "color" should be used as 10000 * 1.5 = 15000 is lower than 20000 * 1 = 20000.

      With limit=100, then we could calculate as follows: there are at most 10000 entries (according to index "color"), so the false positive rate of the "size" index is at least 50%. So cost of "color" is 100 * 1.5 = 150. Cost of "size" is 100 * 1 = 100, but with false positive rate of 50%, so cost is actually 200. Therefor, still the index "color" should be used.

      Attachments

        Issue Links

          Activity

            People

              thomasm Thomas Mueller
              thomasm Thomas Mueller
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated: