Uploaded image for project: 'Jackrabbit Oak'
  1. Jackrabbit Oak
  2. OAK-3099

Revision GC fails when split documents with very long paths are present

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: 1.0.13
    • Fix Version/s: 1.0.18, 1.2.3, 1.3.3, 1.4
    • Component/s: mongomk
    • Labels:
      None

      Description

      My company is using the MongoDB microkernel with Oak, and we've noticed that the daily revision GC is failing with errors like this:

      13.07.2015 13:06:16.261 *ERROR* [pool-7-thread-1-Maintenance Queue(com/adobe/granite/maintenance/job/RevisionCleanupTask)] org.apache.jackrabbit.oak.management.ManagementOperation Revision garbage collection failed
      java.lang.IllegalArgumentException: 13:h113f9d0fe7ac0f87fa06397c37b9ffd4b372eeb1ec93e0818bb4024a32587820
      at org.apache.jackrabbit.oak.plugins.document.Revision.fromString(Revision.java:236)
      at org.apache.jackrabbit.oak.plugins.document.SplitDocumentCleanUp.disconnect(SplitDocumentCleanUp.java:84)
      at org.apache.jackrabbit.oak.plugins.document.SplitDocumentCleanUp.disconnect(SplitDocumentCleanUp.java:56)
      at org.apache.jackrabbit.oak.plugins.document.VersionGCSupport.deleteSplitDocuments(VersionGCSupport.java:53)
      at org.apache.jackrabbit.oak.plugins.document.VersionGarbageCollector.collectSplitDocuments(VersionGarbageCollector.java:117)
      at org.apache.jackrabbit.oak.plugins.document.VersionGarbageCollector.gc(VersionGarbageCollector.java:105)
      at org.apache.jackrabbit.oak.plugins.document.DocumentNodeStoreService$2.run(DocumentNodeStoreService.java:511)
      at org.apache.jackrabbit.oak.spi.state.RevisionGC$1.call(RevisionGC.java:68)
      at org.apache.jackrabbit.oak.spi.state.RevisionGC$1.call(RevisionGC.java:64)
      at java.util.concurrent.FutureTask.run(FutureTask.java:262)
      at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
      at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
      at java.lang.Thread.run(Thread.java:745)
      

      I've narrowed the issue down to the disconnect(NodeDocument) method of the SplitDocumentCleanUp class. The method always tries to extract the path of the node from its ID, but this won't work for documents whose path is very long because those documents will have the hash of their path in the ID.

      I believe this code should fix the issue, but I haven't had a chance to actually try it:

          private void disconnect(NodeDocument splitDoc) {
              String mainId = Utils.getIdFromPath(splitDoc.getMainPath());
              NodeDocument doc = store.find(NODES, mainId);
              if (doc == null) {
                  LOG.warn("Main document {} already removed. Split document is {}",
                          mainId, splitId);
                  return;
              }
              String path = splitDoc.getPath();
              int slashIdx = path.lastIndexOf('/');
              int height = Integer.parseInt(path.substring(slashIdx + 1));
              Revision rev = Revision.fromString(
                      path.substring(path.lastIndexOf('/', slashIdx - 1) + 1, slashIdx));
              doc = doc.findPrevReferencingDoc(rev, height);
              if (doc == null) {
                  LOG.warn("Split document {} not referenced anymore. Main document is {}",
                          splitId, mainId);
                  return;
              }
              // remove reference
              if (doc.getSplitDocType() == INTERMEDIATE) {
                  disconnectFromIntermediate(doc, rev);
              } else {
                  markStaleOnMain(doc, rev, height);
              }
          }
      

      By using getPath(), the code should automatically use either the ID or the _path property, whichever is right for the document.

        Attachments

        1. SplitDocumentGenerator.java
          2 kB
          Csaba Varga
        2. OAK-3099.patch
          5 kB
          Amit Jain

          Activity

            People

            • Assignee:
              amitjain Amit Jain
              Reporter:
              Csaba Varga Csaba Varga
            • Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: