Description
In MongoDocumentStore#query we make a call like
db.nodes.find({ _id: { $gt: "3:/content/foo/01/", $lt: "3:/content/foo010" }, _modified: { $gte: 1405085300 } }).sort({_id:1})
Further we have indexes
- {_id : 1}
- {_modified : -1}
Depending on scenario one of the two index would perform better
- If very few changes have happened in the time interval then _modified would perform better
- In other cases _id index would perform better
Ideally this should be decided by Mongo Query planner but it seems that at times it is not making the right choice. For example getting plan for the query with same shape yields following results
A - planSummary: COLLSCAN ntoreturn:0 ntoskip:0 nscanned:19046849 nscannedObjects:19046849 scanAndOrder:1 keyUpdates:0 numYields:41359 locks(micros) r:103894585 nreturned:2 reslen:1173 61334ms B - 2014-07-10T15:59:19.994-0400 [conn1365] query aem-author.nodes query: {$query: { _id:{ $gt: "3:/content/foo/01/", $lt: "3:/content/foo010" }, _modified:{ $gte: 1404245090 }}, $orderby:{ _id: 1 }} planSummary: IXSCAN{ _modified: -1 } ntoreturn:0 ntoskip:0 nscanned:15626664 nscannedObjects:15626664 scanAndOrder:1 keyUpdates:0 numYields:324016 locks(micros) r:453960384 nreturned:1 reslen:582 972125ms C - 2014-07-11T15:22:44.579-0400 [conn1387] query aem-author.nodes query: { $query: { _id: { $gt: "4:/oak:index/uuid/:index/", $lt: "4:/oak:index/uuid/:index0" }, _modified: { $gte: 1405106530 } }, $orderby: { _id: 1 }, $hint: { _id: 1 } } planSummary: IXSCAN { _id: 1 } ntoreturn:0 ntoskip:0 nscanned:701631 nscannedObjects:701631 keyUpdates:0 numYields:42 locks(micros) r:4471112 nreturned:17 reslen:6557 2540ms
So Mongo used a BasicCursor, _id index, _modified index in different runs. Now lets see whats the difference between time of query and _modified and nscanned
- B - 15626664, 8 days, _modified - Should have used _id index as duration is too large
- C - 701631, 34 sec, _id - Might have used modified index as duration to check for is less
Mongo 2.6 uses heuristics to determine which plan to use. As mentioned in SERVER-13866
Some background: to choose a query plan to use for a given query when multiple candidate plans exist, the query engine runs each candidate plan and then picks the plan that produced the most results during a trial period on a subset of the data to be scanned. The winning query plan is then cached, and used for subsequent queries of the same shape until the cache entry is invalidated (which happens under certain conditions, such as when the data distribution in the collection changes sufficiently or when the chosen query plan performs consistently much worse than it did during initial selection).
So at times Mongo might make a right guess at times not! So we need to determine ways such that right index is used by Mongo to execute a given query
Attachments
Attachments
Issue Links
- relates to
-
OAK-1970 Optimize the diff logic for large number of children case
- Closed