Jackrabbit Content Repository
  1. Jackrabbit Content Repository
  2. JCR-905

Clustering: race condition may cause duplicate entries in search index

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 1.2.1, 1.2.2, 1.2.3, 1.3, 1.3.1
    • Fix Version/s: 1.3.3
    • Component/s: clustering
    • Labels:
      None

      Description

      There seems to be a race condition that may cause duplicate search index entries. It is reproducible as follows (Jackrabbit 1.3):
      1) Start clusternode 1 that just adds a single node of node type clustering:test.
      2) Shutdown clusternode 1.
      3) Start clusternode 2 with an empty search index.
      4) Execute the query //element(*, clustering:test).
      4) Print the result of the query (UUIDs of nodes in the result set).

      When I just run clusternode 2, then there is one node in the resultset, as expected. However, when I debug clusternode 2 and have a breakpoint (i.e., a pause of a few seconds at line 306 of RepositoryImpl.java - just before the clusternode is started), then the resultset contains two results, both with the same UUID.

      1. SearchManager.patch
        0.8 kB
        Marcel Reutegger
      2. log2.txt
        2 kB
        Martijn Hendriks
      3. log1.txt
        1 kB
        Martijn Hendriks
      4. JCR-905.patch
        0.8 kB
        Martijn Hendriks

        Issue Links

          Activity

          Hide
          Martijn Hendriks added a comment -

          I think that the issue is caused by the fact that a Document for the node is created in two different indices as a result of the pause in scenario 2. Consider the attached log snippets. log1.txt shows scenario 1: everything is written to the volatile index. log2.txt shows what happens after a pause of a few seconds: the volatile index with the entry for node A (that has been generated by the index initialization) is written to disk, after which another Document containing node A is added to a new volatile index (as a result of an event that is generated by the cluster synchronization).

          (Please note that I added a custom debug statement to MultiIndex$AddNode).

          Show
          Martijn Hendriks added a comment - I think that the issue is caused by the fact that a Document for the node is created in two different indices as a result of the pause in scenario 2. Consider the attached log snippets. log1.txt shows scenario 1: everything is written to the volatile index. log2.txt shows what happens after a pause of a few seconds: the volatile index with the entry for node A (that has been generated by the index initialization) is written to disk, after which another Document containing node A is added to a new volatile index (as a result of an event that is generated by the cluster synchronization). (Please note that I added a custom debug statement to MultiIndex$AddNode).
          Hide
          Martijn Hendriks added a comment -

          We've found a one-line fix for this issue. The problem is that when the search index receives an event to add a node to the index, it does not take consider the persistent indices. The one-line fix is to first remove the node from the multi-index before adding it again: see attached patch.

          Show
          Martijn Hendriks added a comment - We've found a one-line fix for this issue. The problem is that when the search index receives an event to add a node to the index, it does not take consider the persistent indices. The one-line fix is to first remove the node from the multi-index before adding it again: see attached patch.
          Hide
          Jukka Zitting added a comment -

          Dominique/Marcel, do you see any potential regressions with this patch? I'm not confident enough to apply it in 1.3.1, but it sounds like it definitely should go in trunk before 1.4.

          Show
          Jukka Zitting added a comment - Dominique/Marcel, do you see any potential regressions with this patch? I'm not confident enough to apply it in 1.3.1, but it sounds like it definitely should go in trunk before 1.4.
          Hide
          Marcel Reutegger added a comment -

          This patch adds considerable overhead to the index process because for each added node the index has to first check if the node already exists. In lucene terms this means that lots of index readers and index writers are created and destroyed in a short period of time. The current code relies on the fact that the events passed to the query handler reflect a correct state change on the workspace. E.g. if an event says that a node is added, the index assumes that the node does not exist in the index.

          I see two ways to fix this issue:

          • The query handler does not automatically re-index the workspace, but rather re-plays the cluster-journal to get a valid index.
          • The query handler needs to associate a journal revision with the current index state. When journal events are processed the query handler will ignore events from the 'past'.

          I prefer option 2.

          Show
          Marcel Reutegger added a comment - This patch adds considerable overhead to the index process because for each added node the index has to first check if the node already exists. In lucene terms this means that lots of index readers and index writers are created and destroyed in a short period of time. The current code relies on the fact that the events passed to the query handler reflect a correct state change on the workspace. E.g. if an event says that a node is added, the index assumes that the node does not exist in the index. I see two ways to fix this issue: The query handler does not automatically re-index the workspace, but rather re-plays the cluster-journal to get a valid index. The query handler needs to associate a journal revision with the current index state. When journal events are processed the query handler will ignore events from the 'past'. I prefer option 2.
          Hide
          Martijn Hendriks added a comment -

          I already suspected that the proposed patch would give a significant overhead... Option 2 sounds elegant, but bootstrapping it looks non-trivial to me since the repository and the global revision can change during the re-indexing.

          Show
          Martijn Hendriks added a comment - I already suspected that the proposed patch would give a significant overhead... Option 2 sounds elegant, but bootstrapping it looks non-trivial to me since the repository and the global revision can change during the re-indexing.
          Hide
          Marcel Reutegger added a comment -

          Here's an alternative patch, which handles the possible duplicates earlier in the index process, where it is also possible to detect an external update. This avoids the overhead for Jackrabbit instance, which does not operate in a cluster.

          Can someone with a cluster installation please test the patch and give feedback. I'll then commit the patch if it fixes the issue.

          Show
          Marcel Reutegger added a comment - Here's an alternative patch, which handles the possible duplicates earlier in the index process, where it is also possible to detect an external update. This avoids the overhead for Jackrabbit instance, which does not operate in a cluster. Can someone with a cluster installation please test the patch and give feedback. I'll then commit the patch if it fixes the issue.
          Hide
          Martijn Hendriks added a comment -

          I just tested Marcel's patch and it works fine. It's good to eliminate the overhead for non-clustered installations!

          Best wishes,

          Martijn

          Show
          Martijn Hendriks added a comment - I just tested Marcel's patch and it works fine. It's good to eliminate the overhead for non-clustered installations! Best wishes, Martijn
          Hide
          Marcel Reutegger added a comment -

          Applied patch in revision: 576813

          Thank you all for testing.

          Show
          Marcel Reutegger added a comment - Applied patch in revision: 576813 Thank you all for testing.
          Hide
          Jukka Zitting added a comment -

          Merged to the 1.3 branch in revision 577835.

          Show
          Jukka Zitting added a comment - Merged to the 1.3 branch in revision 577835.

            People

            • Assignee:
              Unassigned
              Reporter:
              Martijn Hendriks
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development