Jackrabbit Content Repository
  1. Jackrabbit Content Repository
  2. JCR-1753

Allow means force a Repository to synchronize with the cluster

    Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.6
    • Component/s: clustering, jackrabbit-core
    • Labels:
      None

      Description

      Based on the thread on the user mailing list I'm logging this to propose adding a sync() method to force cluster synchronization using the JackrabbitRepository extension API.

      The purpose of the method is such that in a distributed clustered environment sometime cluster synchronization does or has not occurred such that certain repositories are in a stale state. This method would provide a means to force a repository to update pull in possible changes made by other Jackrabbit repositories.

        Activity

        Micah Whitacre created issue -
        Hide
        Micah Whitacre added a comment -

        I forgot to include the link to the mailing list thread in the original bug description[1].

        The attachment contains patches for the 1.4 branch of the jackrabbit-core and jackrabbit-api projects. It adds the method forceClusterSync() to the JackrabbitRepository interface and implements the method in the three implementations of the interface.

        The patches stray from Jukka's suggestion of naming the method sync() as the use case I was needing solved involved a clustered environment. sync() seems general enough that it might indicate the repository implementation would pick up any changes made to the data store. I know on the mailing list altering the database directly instead of through JCR/Jackrabbit API is discouraged so perhaps that isn't a valid use case.

        I'm also not completely familiar with all use cases and therefore the implementations of the method only work in a clustered environment. I assume that multiple repositories hitting the same database in an unclustered environment could be considered invalid. However if this assumption is incorrect then the code will need to be changed to handle that.

        [1] - http://www.nabble.com/Forcing-a-cluster-synch-td19578255.html

        Show
        Micah Whitacre added a comment - I forgot to include the link to the mailing list thread in the original bug description [1] . The attachment contains patches for the 1.4 branch of the jackrabbit-core and jackrabbit-api projects. It adds the method forceClusterSync() to the JackrabbitRepository interface and implements the method in the three implementations of the interface. The patches stray from Jukka's suggestion of naming the method sync() as the use case I was needing solved involved a clustered environment. sync() seems general enough that it might indicate the repository implementation would pick up any changes made to the data store. I know on the mailing list altering the database directly instead of through JCR/Jackrabbit API is discouraged so perhaps that isn't a valid use case. I'm also not completely familiar with all use cases and therefore the implementations of the method only work in a clustered environment. I assume that multiple repositories hitting the same database in an unclustered environment could be considered invalid. However if this assumption is incorrect then the code will need to be changed to handle that. [1] - http://www.nabble.com/Forcing-a-cluster-synch-td19578255.html
        Micah Whitacre made changes -
        Field Original Value New Value
        Attachment JCR-1753.tar.gz [ 12390678 ]
        Hide
        Jukka Zitting added a comment -

        Hmm, there's a somewhat releated long discussion ([1] and [2]) about JackrabbitRepository.shutdown() that we had earlier this year. This method is not nearly as intrusive as the shutdown() method, but there might be valid reasons why it shouldn't be exposed to just any client.

        [1] http://markmail.org/message/fqyypq5x6bma4ike
        [2] http://markmail.org/message/dsqyv3rafo4j5xea

        Show
        Jukka Zitting added a comment - Hmm, there's a somewhat releated long discussion ( [1] and [2] ) about JackrabbitRepository.shutdown() that we had earlier this year. This method is not nearly as intrusive as the shutdown() method, but there might be valid reasons why it shouldn't be exposed to just any client. [1] http://markmail.org/message/fqyypq5x6bma4ike [2] http://markmail.org/message/dsqyv3rafo4j5xea
        Jukka Zitting made changes -
        Fix Version/s 1.5 [ 12312920 ]
        Affects Version/s core 1.4.5 [ 12313163 ]
        Summary Allow means force a Repository to synchronize with the data store Allow means force a Repository to synchronize with the cluster
        Assignee Jukka Zitting [ jukkaz ]
        Hide
        Marcel Reutegger added a comment -

        I have the same concerns regarding the visibility of the method.

        I'm also wondering if this is really needed. IIUC the cluster works like an eventually consistent system (see [1] for an introduction). If the application can guarantee 'session consistency' (i.e. an application session always uses a given cluster node and only switches to another cluster node when it goes down) the consequences of the inconsistency window should be very limited.

        Micah, can you please elaborate why you need a cluster node to be in sync? Even if you can sync a cluster node, it might get out of sync again right away by a modification of another cluster node.

        [1] http://www.allthingsdistributed.com/2007/12/eventually_consistent.html

        Show
        Marcel Reutegger added a comment - I have the same concerns regarding the visibility of the method. I'm also wondering if this is really needed. IIUC the cluster works like an eventually consistent system (see [1] for an introduction). If the application can guarantee 'session consistency' (i.e. an application session always uses a given cluster node and only switches to another cluster node when it goes down) the consequences of the inconsistency window should be very limited. Micah, can you please elaborate why you need a cluster node to be in sync? Even if you can sync a cluster node, it might get out of sync again right away by a modification of another cluster node. [1] http://www.allthingsdistributed.com/2007/12/eventually_consistent.html
        Hide
        Micah Whitacre added a comment -

        My use case is described in the mailing list link included in the first comment. I read the article you linked to and it was very interesting. In my situation we are shooting for session consistency however there is not means to guarantee the stickiness of the session on the server side. The setup I have is operations are routed between 3 different JVMs and each JVM is read/writing to the JCR repository. So the use case I'm shooting for is client 1 performs writes which get routed to JVM1. The same client then performs a read on that write however the operation is routed to JVM2. In this situation I know that the write operation has occurred but when retrieving from the repository get a PathNotFoundException. So in that case I'd like to sync to update JVM2. After I sync I attempt to read and either get the value I'm looking for or I don't. If i don't then I know that a concurrent modification has occurred and report the appropriate response back to the client.

        Show
        Micah Whitacre added a comment - My use case is described in the mailing list link included in the first comment. I read the article you linked to and it was very interesting. In my situation we are shooting for session consistency however there is not means to guarantee the stickiness of the session on the server side. The setup I have is operations are routed between 3 different JVMs and each JVM is read/writing to the JCR repository. So the use case I'm shooting for is client 1 performs writes which get routed to JVM1. The same client then performs a read on that write however the operation is routed to JVM2. In this situation I know that the write operation has occurred but when retrieving from the repository get a PathNotFoundException. So in that case I'd like to sync to update JVM2. After I sync I attempt to read and either get the value I'm looking for or I don't. If i don't then I know that a concurrent modification has occurred and report the appropriate response back to the client.
        Hide
        Jukka Zitting added a comment -

        How about if we made Repository.login() and Session.refresh() automatically trigger a cluster sync? I think that would be well in line with both the semantic and performance expectations a typical client would have.

        Show
        Jukka Zitting added a comment - How about if we made Repository.login() and Session.refresh() automatically trigger a cluster sync? I think that would be well in line with both the semantic and performance expectations a typical client would have.
        Jukka Zitting made changes -
        Fix Version/s 1.5 [ 12312920 ]
        Hide
        Dominique Pfister added a comment -

        I'm a bit concerned about the number of sync() calls that might pile up in the end: at the moment, the sync() operation is guarded by a Mutex, roughly like this:

        mutex.acquire();
        try

        { doSync(); }

        finally

        { mutex.release(); }

        a lot of threads might potentially end up handling Repository.login() and Session.refresh() and would therefore all in turn wait for each other, only to start the sync() operation again. I'd therefore change the semantics of the method described above to:

        (1) either return immediately if a thread is already inside doSync() is already in progress
        (2) or wait for the processing thread in doSync() to finish and then return

        Optionally, the wait in (2) might again be limited by a timeout in order to avoid locking the complete system.

        WDYT?

        Show
        Dominique Pfister added a comment - I'm a bit concerned about the number of sync() calls that might pile up in the end: at the moment, the sync() operation is guarded by a Mutex, roughly like this: mutex.acquire(); try { doSync(); } finally { mutex.release(); } a lot of threads might potentially end up handling Repository.login() and Session.refresh() and would therefore all in turn wait for each other, only to start the sync() operation again. I'd therefore change the semantics of the method described above to: (1) either return immediately if a thread is already inside doSync() is already in progress (2) or wait for the processing thread in doSync() to finish and then return Optionally, the wait in (2) might again be limited by a timeout in order to avoid locking the complete system. WDYT?
        Hide
        Jukka Zitting added a comment -

        I'm not sure that there will be that many syncs, especially if we don't trigger the sync from Repository.login() calls. Additionally, if the node already is in sync with the cluster (which would be the usual case), then the sync amounts to just retrieving the journal revision number and comparing it with the local state.

        If the retrieval of the journal revision number is expensive, we could add an additional local guard that just ensures that at least one cluster sync has been done between a client calling Session.refresh() (or Repository.login()) and the call returning. Whether it's that thread or some other thread doing the sync doesn't matter. Something like this:

        private volatile int syncCount = 0;

        int count = syncCount;
        mutex.acquire();
        try {
        if (count == syncCount)

        { doSync(); syncCount++; }

        } finally

        { mutex.release(); }
        Show
        Jukka Zitting added a comment - I'm not sure that there will be that many syncs, especially if we don't trigger the sync from Repository.login() calls. Additionally, if the node already is in sync with the cluster (which would be the usual case), then the sync amounts to just retrieving the journal revision number and comparing it with the local state. If the retrieval of the journal revision number is expensive, we could add an additional local guard that just ensures that at least one cluster sync has been done between a client calling Session.refresh() (or Repository.login()) and the call returning. Whether it's that thread or some other thread doing the sync doesn't matter. Something like this: private volatile int syncCount = 0; int count = syncCount; mutex.acquire(); try { if (count == syncCount) { doSync(); syncCount++; } } finally { mutex.release(); }
        Hide
        Jukka Zitting added a comment -

        Attached a patch (0001-JCR-1753-Allow-means-force-a-Repository-to-synchron.patch) that triggers cluster synchronization in Repository.login() and Session.refresh().

        Show
        Jukka Zitting added a comment - Attached a patch (0001- JCR-1753 -Allow-means-force-a-Repository-to-synchron.patch) that triggers cluster synchronization in Repository.login() and Session.refresh().
        Jukka Zitting made changes -
        Hide
        Jukka Zitting added a comment -

        Attached a patch (0002-JCR-1753-Allow-means-force-a-Repository-to-synchron.patch) that implements the syncCount guard variable.

        Show
        Jukka Zitting added a comment - Attached a patch (0002- JCR-1753 -Allow-means-force-a-Repository-to-synchron.patch) that implements the syncCount guard variable.
        Jukka Zitting made changes -
        Hide
        Dominique Pfister added a comment -

        Patch looks good to me, apart from the missing "h" in method name "syncWitCluster"

        I'm still uneasy with the idea of having journal.sync() being potentially called with every Repository.login() and Session.refresh(), thinking of applications that e.g. test the repository availability by intermittently calling Repository.login(). What about specifying a lower bound on the delay (e.g. 1 second) that should pass before another journal sync actually takes place?

        Show
        Dominique Pfister added a comment - Patch looks good to me, apart from the missing "h" in method name "syncWitCluster" I'm still uneasy with the idea of having journal.sync() being potentially called with every Repository.login() and Session.refresh(), thinking of applications that e.g. test the repository availability by intermittently calling Repository.login(). What about specifying a lower bound on the delay (e.g. 1 second) that should pass before another journal sync actually takes place?
        Hide
        Jukka Zitting added a comment -

        > missing "h" in method name "syncWitCluster"

        Good catch, thanks!

        > I'm still uneasy with the idea of having journal.sync() being potentially called with every
        > Repository.login() and Session.refresh(), thinking of applications that e.g. test the repository
        > availability by intermittently calling Repository.login().

        The performance hit is a single SELECT statement (for DatabaseJournal) or a directory listing (for FileJournal). I don't think that's too much, but you're right in that there are cases (one example is a web site without a session pool) where avoiding any extra perfromance hit on login() would be beneficial.

        I think it would be OK if we didn't trigger the cluster sync on login(). A client could still use refresh() to ensure causal consistency (Micah's use case).

        > What about specifying a lower bound on the delay (e.g. 1 second) that should pass before
        > another journal sync actually takes place?

        That would kind of defeat the purpose as the client would then have no way (apart from explicitly waiting for that one second and retrying the sync) to ensure consistent access to the repository.

        Show
        Jukka Zitting added a comment - > missing "h" in method name "syncWitCluster" Good catch, thanks! > I'm still uneasy with the idea of having journal.sync() being potentially called with every > Repository.login() and Session.refresh(), thinking of applications that e.g. test the repository > availability by intermittently calling Repository.login(). The performance hit is a single SELECT statement (for DatabaseJournal) or a directory listing (for FileJournal). I don't think that's too much, but you're right in that there are cases (one example is a web site without a session pool) where avoiding any extra perfromance hit on login() would be beneficial. I think it would be OK if we didn't trigger the cluster sync on login(). A client could still use refresh() to ensure causal consistency (Micah's use case). > What about specifying a lower bound on the delay (e.g. 1 second) that should pass before > another journal sync actually takes place? That would kind of defeat the purpose as the client would then have no way (apart from explicitly waiting for that one second and retrying the sync) to ensure consistent access to the repository.
        Hide
        Thomas Mueller added a comment -

        We need a configurable lower bound on the delay. Some applications may call refresh() a lot. If there was a change, the cluster mechanism will call PersistenceManager.onExternalUpdate(), which is potentially expensive. Without a lower bound on the delay, the system may get unusably slow. For example, if there is a background thread that calls Session.refresh() once a second, and if PersistenceManager.onExternalUpdate() takes one second.

        Show
        Thomas Mueller added a comment - We need a configurable lower bound on the delay. Some applications may call refresh() a lot. If there was a change, the cluster mechanism will call PersistenceManager.onExternalUpdate(), which is potentially expensive. Without a lower bound on the delay, the system may get unusably slow. For example, if there is a background thread that calls Session.refresh() once a second, and if PersistenceManager.onExternalUpdate() takes one second.
        Hide
        Jukka Zitting added a comment -

        > For example, if there is a background thread that calls Session.refresh() once a second,
        > and if PersistenceManager.onExternalUpdate() takes one second.

        Why is that a problem? With the syncCount patch any later refresh() methods will just wait for the first invocation to finish before just returning without invoking another sync. Also, the usual case is that there are no changes to report, so the amortized cost of the refresh() method is still pretty small.

        The sync mutex already implements a lower bound delay that prevents any two syncs from executing concurrently. Adding more delay is IMHO stepping to the client territory. The client may have a good reason to want to sync more frequently (Micah's round robin case is an excellent example) and I see no reason why the repository should explicitly try to degrade the performance below what the hardware is capable of.

        Show
        Jukka Zitting added a comment - > For example, if there is a background thread that calls Session.refresh() once a second, > and if PersistenceManager.onExternalUpdate() takes one second. Why is that a problem? With the syncCount patch any later refresh() methods will just wait for the first invocation to finish before just returning without invoking another sync. Also, the usual case is that there are no changes to report, so the amortized cost of the refresh() method is still pretty small. The sync mutex already implements a lower bound delay that prevents any two syncs from executing concurrently. Adding more delay is IMHO stepping to the client territory. The client may have a good reason to want to sync more frequently (Micah's round robin case is an excellent example) and I see no reason why the repository should explicitly try to degrade the performance below what the hardware is capable of.
        Hide
        Jukka Zitting added a comment -

        I will create an alternative patch that only triggers the cluster sync in refresh() and includes a configuration option for disabling this feature.

        Show
        Jukka Zitting added a comment - I will create an alternative patch that only triggers the cluster sync in refresh() and includes a configuration option for disabling this feature.
        Hide
        Jukka Zitting added a comment -

        Attached the alternative patch (JCR-1753-sync-in-refresh.patch).

        With this method the client needs to call session.refresh(...) to force a cluster sync.

        I've also included a static (and public) flag variable, SessionImpl.clusterSyncOnRefresh, that an application can set to false if it wants to disable this feature (some applications may have come to expect refresh to be a nearly instantaneous operation, which no longer is true with this feature).

        Show
        Jukka Zitting added a comment - Attached the alternative patch ( JCR-1753 -sync-in-refresh.patch). With this method the client needs to call session.refresh(...) to force a cluster sync. I've also included a static (and public) flag variable, SessionImpl.clusterSyncOnRefresh, that an application can set to false if it wants to disable this feature (some applications may have come to expect refresh to be a nearly instantaneous operation, which no longer is true with this feature).
        Jukka Zitting made changes -
        Attachment JCR-1753-sync-in-refresh.patch [ 12394557 ]
        Hide
        Jukka Zitting added a comment -

        Dropping the jackrabbit-api component as the latest proposals are based on only the standard JCR API.

        Note that this improvement will not make it in Jackrabbit 1.5. Let's target the 1.6 version for this.

        Show
        Jukka Zitting added a comment - Dropping the jackrabbit-api component as the latest proposals are based on only the standard JCR API. Note that this improvement will not make it in Jackrabbit 1.5. Let's target the 1.6 version for this.
        Jukka Zitting made changes -
        Component/s jackrabbit-api [ 12310833 ]
        Fix Version/s 1.6.0 [ 12313459 ]
        Jukka Zitting made changes -
        Fix Version/s 1.6.0 [ 12313459 ]
        Jukka Zitting made changes -
        Workflow jira [ 12442613 ] no-reopen-closed, patch-avail [ 12467818 ]
        Jukka Zitting made changes -
        Status Open [ 1 ] Patch Available [ 10002 ]
        Hide
        Jukka Zitting added a comment -

        Committed a slightly modified version of the patch in revision 792211. Merged to the 1.x branch in revision 792212.

        Cluster synchronization is now enabled by default for Session.refresh(boolean). The synch can be selectively disabled per session by setting the "org.apache.jackrabbit.disableClusterSyncOnRefresh" attribute to any non-null value. Alternatively a SessionImpl subclass can disable (or enable) this feature for all refresh() calls by overriding the protected clusterSyncOnRefresh() method.

        Show
        Jukka Zitting added a comment - Committed a slightly modified version of the patch in revision 792211. Merged to the 1.x branch in revision 792212. Cluster synchronization is now enabled by default for Session.refresh(boolean). The synch can be selectively disabled per session by setting the "org.apache.jackrabbit.disableClusterSyncOnRefresh" attribute to any non-null value. Alternatively a SessionImpl subclass can disable (or enable) this feature for all refresh() calls by overriding the protected clusterSyncOnRefresh() method.
        Jukka Zitting made changes -
        Status Patch Available [ 10002 ] Resolved [ 5 ]
        Fix Version/s 1.6.0 [ 12313459 ]
        Resolution Fixed [ 1 ]
        Jukka Zitting made changes -
        Status Resolved [ 5 ] Closed [ 6 ]

          People

          • Assignee:
            Jukka Zitting
            Reporter:
            Micah Whitacre
          • Votes:
            1 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development