Uploaded image for project: 'Jackrabbit Oak'
  1. Jackrabbit Oak
  2. OAK-3653

Incorrect last revision of cached node state

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 1.0.28, 1.2.11, 1.3.14, 1.4
    • documentmk
    • None

    Description

      While installing a package in one of the systems (Oak-1.0.16), we observed broken workflow model. Upon further investigation, it was found that baseVersion of the model was pointing to a node in version storage which wasn't visible on the instance. This node was available in mongo and could also be seen on a different instance of the cluster.

      The node was created on the same instance where it wasn't visible (cluster id - 3)

      Further investigation showed that for some (yet unknown) reason, lastRevision for cached node state of node's parent at revision where the node was created was stale (an older revision) which must have led to invalid children list and hence unavailable node.

      Attaching a few files which were captured during the investigation:

      • oplog.json - oplog for +-10s for r151191c7601-0-3
      • Capture.JPG- snapshot of groovy console output using script at [0] to list node children cache entry for parent
      • Capture1.JPG- snapshot of groovy console output using script at [1] to traverse to invisible node
      • Capture2.JPG- Almost same as Capture.jpg but this time for node cache instead (using script at [2])

      The node which wasn't visible in mongo:

      db.nodes.findOne({_id: "7:/jcr:system/jcr:versionStorage/d9/9f/22/d99f2279-dcc9-40a0-9aa4-c91fe310287e/1.7"})
      {
              "_id" : "7:/jcr:system/jcr:versionStorage/d9/9f/22/d99f2279-dcc9-40a0-9aa4-c91fe310287e/1.7",
              "_deleted" : {
                      "r151191c7601-0-3" : "false"
              },
              "_commitRoot" : {
                      "r151191c7601-0-3" : "0"
              },
              "_lastRev" : {
                      "r0-0-3" : "r151191c7601-0-3"
              },
              "jcr:primaryType" : {
                      "r151191c7601-0-3" : "\"nam:nt:version\""
              },
              "jcr:uuid" : {
                      "r151191c7601-0-3" : "\"40234f1d-2435-4c6b-9962-20af22b1c948\""
              },
              "_modified" : NumberLong(1447825270),
              "jcr:created" : {
                      "r151191c7601-0-3" : "\"dat:2015-11-17T21:41:14.058-08:00\""
              },
              "_children" : true,
              "jcr:predecessors" : {
                      "r151191c7601-0-3" : "[\"ref:6cecc77b-3020-4a47-b9cc-084a618aa957\"]"
              },
              "jcr:successors" : {
                      "r151191c7601-0-3" : "\"[0]:Reference\""
              },
              "_modCount" : NumberLong(1)
      }
      

      Last revision entry of cached node state of /jcr:system/jcr:versionStorage/d9/9f/22/d99f2279-dcc9-40a0-9aa4-c91fe310287e (parent) at revision r151191c7601-0-3 was r14f035ebdf8-0-3 (the last revs were propagated correctly up till root for r151191c7601-0-3).

      As a work-around, doing cache.invalidate(k) for node cache (inside the loop of script at [2]) revived the visibility of the node.

      Disclaimer: The issue was reported and investigated on a cluster backed by mongo running on Oak-1.0.16. But, the problem itself doesn't seem like any which have been fixed in later versions

      [0]:

      def printNodeChildrenCache(def path) {
        def session = osgi.getService(org.apache.sling.jcr.api.SlingRepository.class).loginAdministrative(null)
        try {
          def rootNode = session.getRootNode()
          def cache = rootNode.sessionDelegate.root.store.nodeChildrenCache
          def cacheMap = cache.asMap()
          cacheMap.each{k,v -> 
            if (k.toString().startsWith(path + "@")) {
              println "${k}:${v}"
            }
          }
        } finally {
          session.logout()
        }
      }
       
      printNodeChildrenCache("/jcr:system/jcr:versionStorage/d9/9f/22/d99f2279-dcc9-40a0-9aa4-c91fe310287e")
      

      [1]:

      def traverse(def path) {
        def session = osgi.getService(org.apache.sling.jcr.api.SlingRepository.class).loginAdministrative(null)
        try {
          def rootNode = session.getRootNode()
          def nb = rootNode.sessionDelegate.root.store.root
          def p = ''
          path.tokenize('/').each() {
            p = p + '/' + it
            nb = nb.getChildNode(it)
            println "${p} ${nb}"
          }
        } finally {
          session.logout()
        }
      }
      traverse("/jcr:system/jcr:versionStorage/d9/9f/22/d99f2279-dcc9-40a0-9aa4-c91fe310287e/1.7")
      

      [2]:

      def printNodeChildrenCache(def path) {
        def session = osgi.getService(org.apache.sling.jcr.api.SlingRepository.class).loginAdministrative(null)
        try {
          def rootNode = session.getRootNode()
          def cache = rootNode.sessionDelegate.root.store.nodeCache
          def cacheMap = cache.asMap()
          cacheMap.each{k,v -> 
            if (k.toString().startsWith(path + "@")) {
              println "${k}:${v}"
              println "lastRev -> ${v.lastRevision}"
            }
          }
        } finally {
          session.logout()
        }
      }
       
      printNodeChildrenCache("/jcr:system/jcr:versionStorage/d9/9f/22/d99f2279-dcc9-40a0-9aa4-c91fe310287e")
      

      Attachments

        1. Capture.JPG
          21 kB
          Vikas Saurabh
        2. Capture1.JPG
          73 kB
          Vikas Saurabh
        3. Capture2.JPG
          124 kB
          Vikas Saurabh
        4. oplog.json
          1.07 MB
          Vikas Saurabh
        5. diagram.png
          55 kB
          Marcel Reutegger

        Issue Links

          Activity

            People

              mreutegg Marcel Reutegger
              catholicon Vikas Saurabh
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: