Jackrabbit Oak
  1. Jackrabbit Oak
  2. OAK-960

Enable session refresh state coordination between multiple session in single thread

    Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: 0.8
    • Fix Version/s: 0.10
    • Component/s: core, jcr
    • Labels:
      None

      Description

      To support certain scenarios like

      • Logging session operation being performed
      • Coordinating refresh state among multiple session used in same thread

      It would be helpful to expose a SessionOperationInterceptor which can be invoked by the SessionDelegate before and after the operation is performed.

      1. OAK-960-refresh-manager.patch
        14 kB
        Michael Dürig
      2. OAK-960.patch
        7 kB
        Chetan Mehrotra

        Issue Links

          Activity

          Hide
          Chetan Mehrotra added a comment -

          Initial proposal

          Show
          Chetan Mehrotra added a comment - Initial proposal
          Hide
          Jukka Zitting added a comment -

          The way it's defined in the patch ("only for reading purposes"), the SessionOperationInterceptor interface would not allow the second goal mentioned in the summary.

          Also, I'm a bit worried about adding a generic extension point to a pretty performance-sensitive place like this. Even a single property access in such an interceptor could have massive performance implications, as it would essentially double the cost of accessing content.

          Would it be better to split this into two simpler tasks, one for extra logging and another for the mentioned refresh coordination? We could handle those (and other similar needs) separately and thus have a better idea of the implications.

          Show
          Jukka Zitting added a comment - The way it's defined in the patch ("only for reading purposes"), the SessionOperationInterceptor interface would not allow the second goal mentioned in the summary. Also, I'm a bit worried about adding a generic extension point to a pretty performance-sensitive place like this. Even a single property access in such an interceptor could have massive performance implications, as it would essentially double the cost of accessing content. Would it be better to split this into two simpler tasks, one for extra logging and another for the mentioned refresh coordination? We could handle those (and other similar needs) separately and thus have a better idea of the implications.
          Hide
          Chetan Mehrotra added a comment -

          The way it's defined in the patch ("only for reading purposes"), the SessionOperationInterceptor interface would not allow the second goal mentioned in the summary.

          The comments were bit misleading. By reading I meant that Interceptor should not perform content access and instead should only access data like type of SessionOperation e.g. Property set operation can return a toString message with what property is being set and that information is accessible to the interceptor.

          Further to achieve the second case the interceptor would check if operation performed is of type update then for other session operation happening in that thread it would invoke SessionDelegate.refreshAtNextAccess. See SessionSynchronizer.java (attached) for one such implementation

          I understand that it is invoked in a critical part and has to be implemented carefully. Once we can validate that such interceptor solves the refresh related problem then we can refactor the code to make it more safe

          Show
          Chetan Mehrotra added a comment - The way it's defined in the patch ("only for reading purposes"), the SessionOperationInterceptor interface would not allow the second goal mentioned in the summary. The comments were bit misleading. By reading I meant that Interceptor should not perform content access and instead should only access data like type of SessionOperation e.g. Property set operation can return a toString message with what property is being set and that information is accessible to the interceptor. Further to achieve the second case the interceptor would check if operation performed is of type update then for other session operation happening in that thread it would invoke SessionDelegate.refreshAtNextAccess. See SessionSynchronizer.java (attached) for one such implementation I understand that it is invoked in a critical part and has to be implemented carefully. Once we can validate that such interceptor solves the refresh related problem then we can refactor the code to make it more safe
          Hide
          Jukka Zitting added a comment -

          OK. It would be good if the allowed side-effects were listed in the SessionOperationInterceptor interface, preferably as explicit input/output parameters so that it's clear what the effects of calling the before() and after() methods can be. See for example the CommitHook and Editor/EditorProvider interfaces that identify the allowed inputs and outputs of the implementing classes.

          SessionSynchronizer

          In general the idea is good, though I'd love to see a simpler implementation as I'm not sure I follow all the details here.

          For example the built-in servlet filter part seems excessive. Instead I'd go for a configurable flag like the existing refresh interval setting, and if needed, have a completely separate servlet filter be in charge of setting that flag.

          I'm not even sure whether there's a need for having this kind of refresh synchronization for HTTP requests but not other threads, why not simply do it for all use cases as interleaving session accesses in a single thread is in any case a potential backwards compatibility issue?

          Show
          Jukka Zitting added a comment - OK. It would be good if the allowed side-effects were listed in the SessionOperationInterceptor interface, preferably as explicit input/output parameters so that it's clear what the effects of calling the before() and after() methods can be. See for example the CommitHook and Editor / EditorProvider interfaces that identify the allowed inputs and outputs of the implementing classes. SessionSynchronizer In general the idea is good, though I'd love to see a simpler implementation as I'm not sure I follow all the details here. For example the built-in servlet filter part seems excessive. Instead I'd go for a configurable flag like the existing refresh interval setting, and if needed, have a completely separate servlet filter be in charge of setting that flag. I'm not even sure whether there's a need for having this kind of refresh synchronization for HTTP requests but not other threads, why not simply do it for all use cases as interleaving session accesses in a single thread is in any case a potential backwards compatibility issue?
          Hide
          Michael Dürig added a comment -

          As discussed off line with Chetan we should not expose SessionDelegate to SessionOperationInterceptor call backs. Instead we should make SessionDelegate implement a new (to be named) interface that we expose to such call backs and that exposes only the necessary methods. This allows us to evolve SessionDelegate and the interceptor mechanism separately.

          Show
          Michael Dürig added a comment - As discussed off line with Chetan we should not expose SessionDelegate to SessionOperationInterceptor call backs. Instead we should make SessionDelegate implement a new (to be named) interface that we expose to such call backs and that exposes only the necessary methods. This allows us to evolve SessionDelegate and the interceptor mechanism separately.
          Hide
          Michael Dürig added a comment -

          OTOH why couldn't we just use a thread local in Oak to track this. Something along the lines of https://github.com/mduerig/jackrabbit-oak/commit/1ec61c778396b99adfacca66454ed79e2c215c03? This approach covers all cases where the repository is accessed though multiple sessions from within the same thread, not only HTTP requests.

          Note that instead of using a static field in SessionDelegate the tread local could also move to a regular instance variable in RepositoryImpl.

          I understand that there is also the logging use case for the interceptor. However, I'd rather separate those two concerns and provide some means specifically targeted to logging.

          Show
          Michael Dürig added a comment - OTOH why couldn't we just use a thread local in Oak to track this. Something along the lines of https://github.com/mduerig/jackrabbit-oak/commit/1ec61c778396b99adfacca66454ed79e2c215c03? This approach covers all cases where the repository is accessed though multiple sessions from within the same thread, not only HTTP requests. Note that instead of using a static field in SessionDelegate the tread local could also move to a regular instance variable in RepositoryImpl . I understand that there is also the logging use case for the interceptor. However, I'd rather separate those two concerns and provide some means specifically targeted to logging.
          Hide
          Chetan Mehrotra added a comment -

          Something along the lines of https://github.com/mduerig/jackrabbit-oak/commit/1ec61c778396b99adfacca66454ed79e2c215c03?

          Interesting approach. One problem I see if supporting cases where more than two sessions are involved. Consider a case involving three sessions s1,s2,s3 invoked in following order

          1. s1 - Does update. LAST_UPDATER = s1
          2. s2 - Does read. Sees the change and cleans the flag. LAST_UPDATER = null
          3. s3 - Does read. Checks LAST_UPDATER. Sees null and hence does not performs a refresh

          Hence in the SessionSynchronizer [1] I had to maintain a memory of the state per session. And the state is cleared only upon request completion. One workaround can be that we use a Map<Integer,Integer> to track the state such that thread local variable does not have a hard reference to Oak specific classes (only referes to JDK classes). This would avoid introducing a memory leak. Just that we might pollute some thread (from pools) with small map. We can ensure that map sizes are trimmed etc.

          Further there is no proper way [2] to cleanup ThreadLocal values except from within the thread that put them in there in the first place. So for such proper cleanups you do require an interception point from where all execution code passes. For HTTP Request Servlet Filter serves that purpose. There are ways using reflection [3]

          [1] https://gist.github.com/chetanmeh/6315991
          [2] http://wiki.apache.org/tomcat/MemoryLeakProtection#ThreadLocal_leaks
          [3] https://weblogs.java.net/blog/jjviana/archive/2010/06/09/dealing-glassfish-301-memory-leak-or-threadlocal-thread-pool-bad-ide

          Show
          Chetan Mehrotra added a comment - Something along the lines of https://github.com/mduerig/jackrabbit-oak/commit/1ec61c778396b99adfacca66454ed79e2c215c03? Interesting approach. One problem I see if supporting cases where more than two sessions are involved. Consider a case involving three sessions s1,s2,s3 invoked in following order s1 - Does update. LAST_UPDATER = s1 s2 - Does read. Sees the change and cleans the flag. LAST_UPDATER = null s3 - Does read. Checks LAST_UPDATER. Sees null and hence does not performs a refresh Hence in the SessionSynchronizer [1] I had to maintain a memory of the state per session. And the state is cleared only upon request completion. One workaround can be that we use a Map<Integer,Integer> to track the state such that thread local variable does not have a hard reference to Oak specific classes (only referes to JDK classes). This would avoid introducing a memory leak. Just that we might pollute some thread (from pools) with small map. We can ensure that map sizes are trimmed etc. Further there is no proper way [2] to cleanup ThreadLocal values except from within the thread that put them in there in the first place. So for such proper cleanups you do require an interception point from where all execution code passes. For HTTP Request Servlet Filter serves that purpose. There are ways using reflection [3] [1] https://gist.github.com/chetanmeh/6315991 [2] http://wiki.apache.org/tomcat/MemoryLeakProtection#ThreadLocal_leaks [3] https://weblogs.java.net/blog/jjviana/archive/2010/06/09/dealing-glassfish-301-memory-leak-or-threadlocal-thread-pool-bad-ide
          Hide
          Michael Dürig added a comment -

          supporting cases where more than two sessions are involved...

          Good catch! We should be able to deal with that by putting the updateCount into the thread local on update operations and having each session delegate compare its own value against that value to detect whether it needs to refresh. This is the same pattern we successfully use in many other places already.

          there is no proper way to cleanup ThreadLocal values...

          Yes but this shouldn't do any harm here as we only put instances of JDK classes into the thread local. The worst thing that AFAICS can happen is a reused thread (from a pool) causing a refresh to a session that would not refresh if running off a fresh thread. I.e updates could leak through causing spurious refreshes.

          Show
          Michael Dürig added a comment - supporting cases where more than two sessions are involved... Good catch! We should be able to deal with that by putting the updateCount into the thread local on update operations and having each session delegate compare its own value against that value to detect whether it needs to refresh. This is the same pattern we successfully use in many other places already. there is no proper way to cleanup ThreadLocal values... Yes but this shouldn't do any harm here as we only put instances of JDK classes into the thread local. The worst thing that AFAICS can happen is a reused thread (from a pool) causing a refresh to a session that would not refresh if running off a fresh thread. I.e updates could leak through causing spurious refreshes.
          Hide
          Michael Dürig added a comment -

          It just occurred to me that tracking the number of saves should be sufficient. No need to track each update. See https://github.com/mduerig/jackrabbit-oak/commit/e31647d3f4328072da0711d8371298d69b5abce0 for a draft of this approach.

          Show
          Michael Dürig added a comment - It just occurred to me that tracking the number of saves should be sufficient. No need to track each update. See https://github.com/mduerig/jackrabbit-oak/commit/e31647d3f4328072da0711d8371298d69b5abce0 for a draft of this approach.
          Hide
          Chetan Mehrotra added a comment -

          That should work!! I was storing something similar to saveCount in threadlocal map which can be saved as an instance variable. Hence we only incur the cost of adding one integer to Threadlocal map.

          So we can do away with interceptor. Just that I wanted to use the same interceptor to try a scenario where instead of maintaining a thread local map I would maintain the state in a global variable effectively causing automatic session refresh for all session bringing it at par with JR2. But may be we do not want to do that!!

          btw you would still provide me a way for logging interception

          Show
          Chetan Mehrotra added a comment - That should work!! I was storing something similar to saveCount in threadlocal map which can be saved as an instance variable. Hence we only incur the cost of adding one integer to Threadlocal map. So we can do away with interceptor. Just that I wanted to use the same interceptor to try a scenario where instead of maintaining a thread local map I would maintain the state in a global variable effectively causing automatic session refresh for all session bringing it at par with JR2. But may be we do not want to do that!! btw you would still provide me a way for logging interception
          Hide
          Chetan Mehrotra added a comment -

          Removed the SessionOperationInterceptor and used the approach suggested by Michael Duerig with revision 1517503

          Show
          Chetan Mehrotra added a comment - Removed the SessionOperationInterceptor and used the approach suggested by Michael Duerig with revision 1517503
          Hide
          Michael Dürig added a comment -

          Thanks for taking care of this.

          One thing we should probably do is to move SessionDelegate#SAVE_COUNT to RepositoryImpl and make it non static. Otherwise refreshes of multiple repositories in the same JVM would interfere.

          Since the whole session refresh logic gets ever more complex, it might even make sense to factor it into a separate SessionRefreshManager class and pass an instance of that to each session delegate. This would decouple the refresh strategy from the rest of the code. WDYT?

          Show
          Michael Dürig added a comment - Thanks for taking care of this. One thing we should probably do is to move SessionDelegate#SAVE_COUNT to RepositoryImpl and make it non static. Otherwise refreshes of multiple repositories in the same JVM would interfere. Since the whole session refresh logic gets ever more complex, it might even make sense to factor it into a separate SessionRefreshManager class and pass an instance of that to each session delegate. This would decouple the refresh strategy from the rest of the code. WDYT?
          Hide
          Michael Dürig added a comment -

          Proposed patch for factoring the refresh logic into its own class

          Show
          Michael Dürig added a comment - Proposed patch for factoring the refresh logic into its own class
          Hide
          Michael Dürig added a comment -

          I (somewhat accidentally) applied the patch at http://svn.apache.org/r1517759. I can still revert if this causes issues.

          Show
          Michael Dürig added a comment - I (somewhat accidentally) applied the patch at http://svn.apache.org/r1517759 . I can still revert if this causes issues.
          Hide
          Jukka Zitting added a comment -

          Looks good so far! Some comments:

          • I propose to rename SessionRefreshManager to RefreshStrategy and model it after the Strategy pattern. That way we can simplify the complex multilevel conditional in refreshIfNecessary() with a composition of simpler refresh strategies. And something like a no-op RefreshStrategy.EXPLICIT instance could be used to avoid all the extra steps for an Oak-aware client that doesn't need the auto-refresh functionality.
          • For better encapsulation it would be better if refreshIfNecessary() was renamed to something like isRefreshNecessary() and the SessionDelegate argument removed. That way the refresh strategy only needs to know whether a refresh is needed, not how it is done.
          Show
          Jukka Zitting added a comment - Looks good so far! Some comments: I propose to rename SessionRefreshManager to RefreshStrategy and model it after the Strategy pattern. That way we can simplify the complex multilevel conditional in refreshIfNecessary() with a composition of simpler refresh strategies. And something like a no-op RefreshStrategy.EXPLICIT instance could be used to avoid all the extra steps for an Oak-aware client that doesn't need the auto-refresh functionality. For better encapsulation it would be better if refreshIfNecessary() was renamed to something like isRefreshNecessary() and the SessionDelegate argument removed. That way the refresh strategy only needs to know whether a refresh is needed, not how it is done.
          Hide
          Michael Dürig added a comment -

          Ack:

          • At revision 1517849 I addressed the isRefreshNecessary concern as suggested.
          • At revision 1517850 I removed warnings for long lived sessions if these will be refreshed with the in thread sync. mechanism. In a further step we should replace this warning through JMX monitoring. See OAK-941.
          • At revision 1517851 I moved RefreshManager up to the jcr package as a preparation for further refactorings to the mentioned strategy pattern. See TODO in the code.
          Show
          Michael Dürig added a comment - Ack: At revision 1517849 I addressed the isRefreshNecessary concern as suggested. At revision 1517850 I removed warnings for long lived sessions if these will be refreshed with the in thread sync. mechanism. In a further step we should replace this warning through JMX monitoring. See OAK-941 . At revision 1517851 I moved RefreshManager up to the jcr package as a preparation for further refactorings to the mentioned strategy pattern. See TODO in the code.
          Hide
          Michael Dürig added a comment -

          At revision 1518137 I set the auto refresh interval for sessions to Long.MAX_VALUE effectively disabling it. The new per thread session synchronisation mechanisms turns out to be the better approach making the former unnecessary.

          Show
          Michael Dürig added a comment - At revision 1518137 I set the auto refresh interval for sessions to Long.MAX_VALUE effectively disabling it. The new per thread session synchronisation mechanisms turns out to be the better approach making the former unnecessary.
          Hide
          Michael Dürig added a comment -

          At revision 1519422 I sliced the RefreshManager up into a composite of {{RefreshStrategy}}s.

          Show
          Michael Dürig added a comment - At revision 1519422 I sliced the RefreshManager up into a composite of {{RefreshStrategy}}s.
          Hide
          Chetan Mehrotra added a comment -

          With support for RefreshStrategy done this issue can be considered fixed as now two session participating in same thread get updated

          Show
          Chetan Mehrotra added a comment - With support for RefreshStrategy done this issue can be considered fixed as now two session participating in same thread get updated

            People

            • Assignee:
              Chetan Mehrotra
              Reporter:
              Chetan Mehrotra
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development