Jackrabbit Content Repository
  1. Jackrabbit Content Repository
  2. JCR-3162

Index update overhead on cluster slave due to JCR-905

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 2.3.6
    • Component/s: clustering
    • Labels:
      None

      Description

      JCR-905 is a quick and dirty fix and causes overhead on a cluster slave node when it processes revisions.

      1. JCR-3162.patch
        7 kB
        Alex Parvulescu
      2. JCR-3162-v2.patch
        15 kB
        Alex Parvulescu
      3. JCR-3162-v3.patch
        29 kB
        Alex Parvulescu
      4. JCR-3162-v4.patch
        24 kB
        Alex Parvulescu
      5. JCR-3162-v5.patch
        18 kB
        Marcel Reutegger
      6. DbClusterTestJCR3162.patch
        3 kB
        Marcel Reutegger

        Activity

        Hide
        Alex Parvulescu added a comment -

        Studying this problem revealed that this issue happens whenever we are dealing with a cluster sync operation involving an instance that has been running for a really long time.

        At this point I'm not sure what really long time means exactly, but it would appear that after a while the journal revision resets to 0.
        This causes the cluster slave to sync using a lower revision number, thus fetching the journal records again, which would determine the repository to index them again.
        If the current index corresponds to a bigger revision number, re-indexing again means that there will be duplicates in the index.

        JCR-905 tried to address that by first deleting all the records that come from an external source (the cluster sync) before adding them.

        The proposed solution tries to determine on repository startup if the index is stale and tries to force a full reindex by deleting it.
        Index staleness is currently determined by checking if journal revision is 0 and if there are already index files present in the repository.

        Interestingly this happens a lot during tests when the index is conserved from one restart to the other, but the journal impl is memory based so it gets reset every time.

        The solution has some issues because of the asynchronous initialization of SearchIndex for workspaces other than "default". Meaning that by the time the SearchIndex gets initialized, the cluster node has already sync'ed to a bigger revision than 0, even if it was 0 at the moment when the repo was starting up.
        But this doesn't apply to the default workspace.

        Show
        Alex Parvulescu added a comment - Studying this problem revealed that this issue happens whenever we are dealing with a cluster sync operation involving an instance that has been running for a really long time. At this point I'm not sure what really long time means exactly, but it would appear that after a while the journal revision resets to 0. This causes the cluster slave to sync using a lower revision number, thus fetching the journal records again, which would determine the repository to index them again. If the current index corresponds to a bigger revision number, re-indexing again means that there will be duplicates in the index. JCR-905 tried to address that by first deleting all the records that come from an external source (the cluster sync) before adding them. The proposed solution tries to determine on repository startup if the index is stale and tries to force a full reindex by deleting it. Index staleness is currently determined by checking if journal revision is 0 and if there are already index files present in the repository. Interestingly this happens a lot during tests when the index is conserved from one restart to the other, but the journal impl is memory based so it gets reset every time. The solution has some issues because of the asynchronous initialization of SearchIndex for workspaces other than "default". Meaning that by the time the SearchIndex gets initialized, the cluster node has already sync'ed to a bigger revision than 0, even if it was 0 at the moment when the repo was starting up. But this doesn't apply to the default workspace.
        Hide
        Alex Parvulescu added a comment -

        attaching patch.

        Warning: this is a work in progress, not really tested.
        It includes changes from JCR-3160 (as some files overlap).

        Feedback needed.

        Show
        Alex Parvulescu added a comment - attaching patch. Warning: this is a work in progress, not really tested. It includes changes from JCR-3160 (as some files overlap). Feedback needed.
        Hide
        Alex Parvulescu added a comment -

        Attaching v2 of the patch.

        It is basically v1 + a test + some feature flags.
        I managed to find a way to test this problem within JR Core, which should allow us to have some meaningful progress.

        The test DbClusterTestJCR3162 contains 2 feature flags (which will be removed once the patch is good):

        • JCR3162_ENABLE_CLUSTER_DELETE this is the JCR-905 fix which can be switched on/off with the help of this flag
        • JCR3162_ENABLE_REINDEX this controls the new code (reindexing if the index is considered stale)

        So basically:

        • both flags off: you can see the initial behaviour (pre JCR-905) in action. The test fails (as it should).
        • just JCR3162_ENABLE_CLUSTER_DELETE represents the current behaviour of the repository (the test passes)
        • just JCR3162_ENABLE_REINDEX represents the patch in action (the test passes!)

        The original problem still stands: how do we know if an index is stale or not. Default workspace will be covered, the other workspaces are going to be initializes asynchronously, so there are no guarantees.

        Show
        Alex Parvulescu added a comment - Attaching v2 of the patch. It is basically v1 + a test + some feature flags. I managed to find a way to test this problem within JR Core, which should allow us to have some meaningful progress. The test DbClusterTestJCR3162 contains 2 feature flags (which will be removed once the patch is good): JCR3162_ENABLE_CLUSTER_DELETE this is the JCR-905 fix which can be switched on/off with the help of this flag JCR3162_ENABLE_REINDEX this controls the new code (reindexing if the index is considered stale) So basically: both flags off: you can see the initial behaviour (pre JCR-905 ) in action. The test fails (as it should). just JCR3162_ENABLE_CLUSTER_DELETE represents the current behaviour of the repository (the test passes) just JCR3162_ENABLE_REINDEX represents the patch in action (the test passes!) The original problem still stands: how do we know if an index is stale or not. Default workspace will be covered, the other workspaces are going to be initializes asynchronously, so there are no guarantees.
        Hide
        Alex Parvulescu added a comment -

        V3 comes with a complete redesign of the patch.

        After further analysis we've decided to go with inspecting the incoming journal changes in the case of an initial index re-build.

        I'll try to clarify. The scope of JCR-905 fix should only be for an initial index build.
        The initial indexing operation can cause doubles to appear, as some nodes can be seen by a slave before the ADD event has reached it. This happens because of shared storage between cluster nodes.
        So, when a slave starts to re-index the repository content, it will include everything (potentially also nodes that is hasn't received a ADD event for yet).
        When the index finishes, the repository will continue its startup. A bit later, the cluster component will also initialize and consequently sync. This will pull in the ADD events that were pending in a newer revision, on the master.

        The V3 tries to poll the changes before the cluster.sync call, and preemptively generate DELETE events for all the ADD events that it finds on the current workspace.
        (this is similar to the JCR-905 patch, but with a much smaller scope).

        Another feature introduced in the patch is to force flush the index after the initial index has been created.
        This was artificially done in the original test case (no unit test though) by:
        > However, when I debug clusternode 2 and have a breakpoint (i.e., a pause of a few seconds at line 306 of RepositoryImpl.java - just before the clusternode is started), then the resultset contains two results, both with the same UUID.

        So forcing the index flush will correctly reproduce the original problem. And I think should be the correct behaviour of the original index creation.
        On the other hand, not flushing the index will hide the problem because the indexing queue is smart enough to remove doubles.

        But, flushing the index basically invalidates JCR-905, which is a bit unexpected (see attached patch, by switching the feature flags off).

        On the code itself: I guess the AbstractJournal could use a bit of refactoring on the event polling side.

        Show
        Alex Parvulescu added a comment - V3 comes with a complete redesign of the patch. After further analysis we've decided to go with inspecting the incoming journal changes in the case of an initial index re-build. I'll try to clarify. The scope of JCR-905 fix should only be for an initial index build. The initial indexing operation can cause doubles to appear, as some nodes can be seen by a slave before the ADD event has reached it. This happens because of shared storage between cluster nodes. So, when a slave starts to re-index the repository content, it will include everything (potentially also nodes that is hasn't received a ADD event for yet). When the index finishes, the repository will continue its startup. A bit later, the cluster component will also initialize and consequently sync. This will pull in the ADD events that were pending in a newer revision, on the master. The V3 tries to poll the changes before the cluster.sync call, and preemptively generate DELETE events for all the ADD events that it finds on the current workspace. (this is similar to the JCR-905 patch, but with a much smaller scope). Another feature introduced in the patch is to force flush the index after the initial index has been created. This was artificially done in the original test case (no unit test though) by: > However, when I debug clusternode 2 and have a breakpoint (i.e., a pause of a few seconds at line 306 of RepositoryImpl.java - just before the clusternode is started), then the resultset contains two results, both with the same UUID. So forcing the index flush will correctly reproduce the original problem. And I think should be the correct behaviour of the original index creation. On the other hand, not flushing the index will hide the problem because the indexing queue is smart enough to remove doubles. But, flushing the index basically invalidates JCR-905 , which is a bit unexpected (see attached patch, by switching the feature flags off). On the code itself: I guess the AbstractJournal could use a bit of refactoring on the event polling side.
        Hide
        Marcel Reutegger added a comment -

        Thanks for the patch.

        I see that you moved the initial index creation from the SearchIndex to the MultiIndex. This is how it was before JCR-1064, where we had to move it out of the MultiIndex. I'm not sure if it is now safe to move the initial index creation trigger back to the SearchIndex...

        The are some imports which look odd. The class ClusterNode imports the MultiIndex, though this is probably just minor because it is not really required to compile the class, but only to resolve the javadoc reference. More troublesome is the import of DbClusterTestJCR3162 in SearchManager. That doesn't work because DbClusterTestJCR3162 is a test class.

        I'm also wordering if it is possible to fix this issue without introducing too much changes to the journal/cluster code. Wouldn't it be sufficient to use the existing journal and cluster node interface to get changes from a given revision?

        Show
        Marcel Reutegger added a comment - Thanks for the patch. I see that you moved the initial index creation from the SearchIndex to the MultiIndex. This is how it was before JCR-1064 , where we had to move it out of the MultiIndex. I'm not sure if it is now safe to move the initial index creation trigger back to the SearchIndex... The are some imports which look odd. The class ClusterNode imports the MultiIndex, though this is probably just minor because it is not really required to compile the class, but only to resolve the javadoc reference. More troublesome is the import of DbClusterTestJCR3162 in SearchManager. That doesn't work because DbClusterTestJCR3162 is a test class. I'm also wordering if it is possible to fix this issue without introducing too much changes to the journal/cluster code. Wouldn't it be sufficient to use the existing journal and cluster node interface to get changes from a given revision?
        Hide
        Alex Parvulescu added a comment -

        hi, thanks for the feedback.

        > I see that you moved the initial index creation from the SearchIndex to the MultiIndex. This is how it was before JCR-1064, where we had to move it out of the MultiIndex. I'm not sure if it is now safe to move the initial index creation trigger back to the SearchIndex...

        I wasn't aware of that move. I can put it back, it just felt a bit off to have code that is potentially internal to MultiIndex outside controlled by SearchIndex.

        > The class ClusterNode imports the MultiIndex, though this is probably just minor because it is not really required to compile the class, but only to resolve the javadoc reference
        True. We can remove that import.

        > More troublesome is the import of DbClusterTestJCR3162 in SearchManager. That doesn't work because DbClusterTestJCR3162 is a test class.
        That is just the feature flag. It will be removed once we agree on what the code should look like.

        >Wouldn't it be sufficient to use the existing journal and cluster node interface to get changes from a given revision?
        I'm not aware of any such interface. You have the local revision number available, but not the actual changes, that is why I've added the 'poll' operation.

        Now I'm investigating a lock on Cluster.sync introduced by this patch, so I'll upload another one soon that will take this feedback into consideration.

        Show
        Alex Parvulescu added a comment - hi, thanks for the feedback. > I see that you moved the initial index creation from the SearchIndex to the MultiIndex. This is how it was before JCR-1064 , where we had to move it out of the MultiIndex. I'm not sure if it is now safe to move the initial index creation trigger back to the SearchIndex... I wasn't aware of that move. I can put it back, it just felt a bit off to have code that is potentially internal to MultiIndex outside controlled by SearchIndex. > The class ClusterNode imports the MultiIndex, though this is probably just minor because it is not really required to compile the class, but only to resolve the javadoc reference True. We can remove that import. > More troublesome is the import of DbClusterTestJCR3162 in SearchManager. That doesn't work because DbClusterTestJCR3162 is a test class. That is just the feature flag. It will be removed once we agree on what the code should look like. >Wouldn't it be sufficient to use the existing journal and cluster node interface to get changes from a given revision? I'm not aware of any such interface. You have the local revision number available, but not the actual changes, that is why I've added the 'poll' operation. Now I'm investigating a lock on Cluster.sync introduced by this patch, so I'll upload another one soon that will take this feedback into consideration.
        Hide
        Marcel Reutegger added a comment -

        I think Journal.getRecords(long startRevision) could be used for that purpose.

        Show
        Marcel Reutegger added a comment - I think Journal.getRecords(long startRevision) could be used for that purpose.
        Hide
        Alex Parvulescu added a comment -

        > I think Journal.getRecords(long startRevision) could be used for that purpose.

        yes,that makes perfect sense! (and it gets rid of my lock problem

        Attaching v4:

        • using Journal.getRecords
        • cleanup of imports
        • moved feature flags from 'test' code to main code (I'll remove them entirely once the code is good)
        Show
        Alex Parvulescu added a comment - > I think Journal.getRecords(long startRevision) could be used for that purpose. yes,that makes perfect sense! (and it gets rid of my lock problem Attaching v4: using Journal.getRecords cleanup of imports moved feature flags from 'test' code to main code (I'll remove them entirely once the code is good)
        Hide
        Marcel Reutegger added a comment -

        Looks better now to me.

        Some more comments:

        • I would rather put getting the change log records into the SearchIndex, to not expand the api of the cluster node unnecessarily.
        • getChangeLogRecords() should check the producer id of the record
        • processing of ClusterRecords can be done without instanceof check
        • Record.getWorkspace() may return null for operations on the version storage (RepositoryImpl.getSearchManager() should pass null as workspace name?)
        • instance revision is also available from ClusterNode.getRevision()

        I'll attach a patch.

        Show
        Marcel Reutegger added a comment - Looks better now to me. Some more comments: I would rather put getting the change log records into the SearchIndex, to not expand the api of the cluster node unnecessarily. getChangeLogRecords() should check the producer id of the record processing of ClusterRecords can be done without instanceof check Record.getWorkspace() may return null for operations on the version storage (RepositoryImpl.getSearchManager() should pass null as workspace name?) instance revision is also available from ClusterNode.getRevision() I'll attach a patch.
        Hide
        Alex Parvulescu added a comment -

        thanks for taking the time to rewrite the patch!

        It looks good to me.
        I'll remove the feature flags and the ol' dirty fix and commit the code.

        Show
        Alex Parvulescu added a comment - thanks for taking the time to rewrite the patch! It looks good to me. I'll remove the feature flags and the ol' dirty fix and commit the code.
        Hide
        Alex Parvulescu added a comment -

        fixed in revision 1214329

        Show
        Alex Parvulescu added a comment - fixed in revision 1214329
        Hide
        Marcel Reutegger added a comment -

        > I'll remove the feature flags and the ol' dirty fix and commit the code.

        my patch was just a quick rewrite of the patch to illustrate my comments. I didn't run any tests with the patch. Did you run the tests? Otherwise I suggest we revert the commit for now...

        Show
        Marcel Reutegger added a comment - > I'll remove the feature flags and the ol' dirty fix and commit the code. my patch was just a quick rewrite of the patch to illustrate my comments. I didn't run any tests with the patch. Did you run the tests? Otherwise I suggest we revert the commit for now...
        Hide
        Alex Parvulescu added a comment -

        yes I did. do you see any tests failing?

        is there any reason for reverting the change?

        Show
        Alex Parvulescu added a comment - yes I did. do you see any tests failing? is there any reason for reverting the change?
        Hide
        Marcel Reutegger added a comment -

        I also noticed that the test is not included in the TestAll suite of the respective package. This means the new test is not run with every build.

        The new test uses annotations to mark the test, which makes it difficult to use in the TestAll. Is there a specific reason why you used annotations instead of inheriting from JUnitTest.

        I quickly extended the test to make the test nodes versionable and check consistency on the complete workspace (including the version storage). Running the test now results in random failure for me.

        See attached patch.

        Show
        Marcel Reutegger added a comment - I also noticed that the test is not included in the TestAll suite of the respective package. This means the new test is not run with every build. The new test uses annotations to mark the test, which makes it difficult to use in the TestAll. Is there a specific reason why you used annotations instead of inheriting from JUnitTest. I quickly extended the test to make the test nodes versionable and check consistency on the complete workspace (including the version storage). Running the test now results in random failure for me. See attached patch.
        Hide
        Marcel Reutegger added a comment -

        To me it looks like this is not yet fixed.

        Show
        Marcel Reutegger added a comment - To me it looks like this is not yet fixed.
        Hide
        Alex Parvulescu added a comment -

        Reverted the patch.

        It seems that there are some problems with the consistency chech on the versioning store.

        I'm not sure it is related to this code, but just to be on the safe side I've reverted the changes.

        There are 2 issues blocking this issue, both tied to the query on the versioning store:

        • some queries just fail (JCR-3182)
        • some queries are built wrong (possibly because of the same issue)
          Path
          '/jcr:system/jcr:versionStorage/00'
          Generated SQL2:
          SELECT NODE.* FROM [nt:base] AS NODE WHERE ISCHILDNODE(NODE, [/jcr:system/jcr:versionStorage/0])
        Show
        Alex Parvulescu added a comment - Reverted the patch. It seems that there are some problems with the consistency chech on the versioning store. I'm not sure it is related to this code, but just to be on the safe side I've reverted the changes. There are 2 issues blocking this issue, both tied to the query on the versioning store: some queries just fail ( JCR-3182 ) some queries are built wrong (possibly because of the same issue) Path '/jcr:system/jcr:versionStorage/00' Generated SQL2: SELECT NODE.* FROM [nt:base] AS NODE WHERE ISCHILDNODE(NODE, [/jcr:system/jcr:versionStorage/0] )
        Hide
        Marcel Reutegger added a comment -

        I can confirm that the test runs successful when the path in the query statement is quoted.

        I think now that we know what's going on, we can apply the patch again. Sorry, for being overly cautious....

        Show
        Marcel Reutegger added a comment - I can confirm that the test runs successful when the path in the query statement is quoted. I think now that we know what's going on, we can apply the patch again. Sorry, for being overly cautious....
        Hide
        Alex Parvulescu added a comment -

        Fixed in rev 1214842.

        False alarm, the errors came from an issue with the SQL2 Parsed (JCR-3182).
        As it turns out, we have a workaround for it: quoting the path helps the parser work properly.

        Show
        Alex Parvulescu added a comment - Fixed in rev 1214842. False alarm, the errors came from an issue with the SQL2 Parsed ( JCR-3182 ). As it turns out, we have a workaround for it: quoting the path helps the parser work properly.

          People

          • Assignee:
            Unassigned
            Reporter:
            Alex Parvulescu
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development