[OAK-3865] New strategy to optimize secondary reads - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 1.5.5, 1.6.0
Component/s: mongomk
Labels:
- performance

Description

Introduction

In the current trunk we'll only read document D from the secondary instance if:
(1) we have the parent P of document D cached and
(2) the parent hasn't been modified in 6 hours.

The ~~OAK-2106~~ tried to optimise (2) by estimating lag using MongoDB replica stats. It was unreliable, so the second approach was to read the last revisions directly from each Mongo instance. If the modification date of P is before last revisions on all secondary Mongos, then secondary can be used.

The main problem with this approach is that we still need to have the P to be in cache. I think we need another way to optimise the secondary reading, as right now only about 3% of requests connects to the secondary, which is bad especially for the global-clustering case (Mongo and Oak instances across the globe). The optimisation provided in ~~OAK-2106~~ doesn't make the things much better and may introduce some consistency issues.

Proposal - tldr version

Oak will remember the last revision it has ever seen. In the same time, it'll query each secondary Mongo instance, asking what's the available stored root revision. If all secondary instances have a root revision >= last revision seen by a given Oak instance, it's safe to use the secondary read preference.

Proposal

I had following constraints in mind preparing this:
1. Let's assume we have a sequence of commits with revisions R1, R2 and R3 modifying nodes N1, N2 and N3. If we already read the N1 from revision R2 then reading from a secondary shouldn't result in getting older revision (eg. R1).
2. If an Oak instance modifies a document, then reading from a secondary shouldn't result in getting the old version (before modification).

So, let's have two maps:

M1 the most recent document revision read from the Mongo for each cluster id,
M2 the oldest last rev value for root document for each cluster id read from all the secondary instances.

Maintaining M1:
For every read from the Mongo we'll check if the lastRev for some cluster id is newer than M1 entry. If so, we'll update M1. For all writes we'll add the saved revision id with the current cluster id in M1.

Maintaining M2:
It should be periodically updated. Such mechanism is already prepared in the ~~OAK-2106~~ patch.

The method deciding whether we can read from the secondary instance should compare two maps. If all entries in M2 are newer than M1 it means that the secondary instances contains at least as new repository state as we already accessed and therefore it's safe to read from secondary.

Regarding the documents modified by the local Oak instance, we should remember all the locally-modified paths and their revisions and use primary Mongo to access them as long as the changes are not replicated to all the secondaries. When the secondaries are up to date with the modification, we can remove it from the local-changes collections.

Attached image diagram.png presents the idea.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

diagram.png
15/Jan/16 11:53
386 kB
Tomek Rękawek
clustered-oak-setup-improvements.pdf
08/Jun/16 12:45
403 kB
Tomek Rękawek
OAK-3865.patch
15/Jun/16 11:31
66 kB
Tomek Rękawek
ReadDeepTreeNoCacheTest.patch
16/Jun/16 13:56
10 kB
Tomek Rękawek

Issue Links

breaks

OAK-4912 MongoDB: ReadPreferenceIT.testMongoReadPreferencesForLocalChanges() occasionally fails

Closed

is related to

OAK-4486 [IT][Failures] testPreferenceConversion, testMongoReadPreferencesWithAge

Closed

relates to

OAK-8938 Oak run recovery fails when running on mongo replicaSet with auth enabled

Resolved

supercedes

OAK-2106 Optimize reads from secondaries

Resolved

Activity

People

Assignee:: Tomek Rękawek

Reporter:: Tomek Rękawek

Votes:: 0 Vote for this issue

Watchers:: 7 Start watching this issue

Dates

Created:: 12/Jan/16 14:07

Updated:: 09/Mar/20 14:28

Resolved:: 06/Jul/16 11:03