Uploaded image for project: 'Jackrabbit Oak'
  1. Jackrabbit Oak
  2. OAK-2359

read is inefficient when there are many split documents

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Critical
    • Resolution: Fixed
    • 1.0.8
    • 1.0.10, 1.1.4
    • core
    • None
    • 1.0.8.r1644758

    Description

      As reported in OAK-2358 there is a potential problem with revisionGC not cleaning up split documents properly (in 1.0.8.r1644758 at least).

      As a side-effect, having many garbage-revisions renders the diffImpl algorithm to become very slow - normally it would take only a few millis, but with nodes that have many split-documents I can see diffImpl take hundres of millis, sometimes up to a few seconds. Which causes the observation dequeuing to be slower than the rate in which observation events are enqueued, which results in observation queue never being cleaned up and event handling being delayed more and more.

      Adding some logging showed that diffImpl would often read many split-documents, which supports the assumption that the revisionGC not cleaning up revisions has the diffImpl-slowness as a side-effect. Having said that - diffImpl should probably still be able to run fast, since all the revisions it should look at should be in the main document, not in split documents.

      I dont have a test case handy for this at the moment unfortunately - if more is coming up, I'll add more details here.

      Attachments

        1. oak2359patch.diff
          7 kB
          Stefan Egli

        Issue Links

          Activity

            People

              mreutegg Marcel Reutegger
              stefanegli Stefan Egli
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: