Uploaded image for project: 'Jackrabbit Oak'
  1. Jackrabbit Oak
  2. OAK-11184

Resolving a node with an unmerged or not-yet-visible revision as the only revision results in previous document scan (which can be expensive on root)

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 1.72.0
    • documentmk
    • None

    Description

      Consider a scenario with 2 cluster nodes (running DocumentNodeStore):

      • cluster node A with cluster node id 1
      • cluster node B with cluster node id 2

      Now cluster node A is doing a merge that includes changes (eg a property) on root. Such a merge includes two updates towards DocumentStore (followed by background stuff) :

      • the first update has the actual changes on the properties (that would include any other document other than root - but that's not so relevant here). Say that happens with revision "rn-0-1"
      • the second update is marking the revision as committed - which is done by adding an entry to "_revisions" with "rn-0-1" : "c"
      • after the merge, the usual backgroundWrite(on A)/backgroundRead(on B) will follow.

      At some point cluster node B reads the root node and does a getNodeAtRevision. The behavior slightly differs between the two cases:

      • after step 1 it will read "rn-0-1" in an unmerged state (it has no commit value yet)
      • after step 2 it will read the revision in a not yet visible state

      Either way this revision value resolves to null. As a result of which it greedily reads through previous documents to find a split-away property value only to find nothing. If there are many previous documents, which is likely on root, this is a significant performance hit.

      This situation persists until a full backgroundWrite/Read are done, so until step 3 above is done.

      Now backgroundRead requires the exclusive lock on the backgroundOperationLock as part of updating a fresh main root. If B happens to have that exclusive lock occupied by anyone else, it has to wait.

      As part of a regular merge though, the "read/non-exclusive" lock of backgroundOperationLock is acquired.

      If there are a number of threads "ahead" of backgroundRead all acquiring the read-lock of backgroundOperationLock, the backgroundRead will have to wait until all commits are done.

      So if B is in such a situation, with all merge operations when updating root going through previous documents, it can result in an overall significant delay.

      (This issue can also happen on any other node - but it is only a problem if there are many previous documents. And root usually does have many, hence the problem is primarily on root)

      Attachments

        Activity

          People

            stefanegli Stefan Egli
            stefanegli Stefan Egli
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: