Jackrabbit Content Repository
  1. Jackrabbit Content Repository
  2. JCR-954

Allow to disable referential integrity checking for workspace

    Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.3.4, 1.5
    • Component/s: jackrabbit-core
    • Labels:
      None

      Description

      Some operations like clone, remove operating on huge subtree of nodes requires a lot of memory. To copy, clone, remove subtree all nodes are loaded into transient spaces. It allows such operations to be transactional, from other side it requires a lot of heap size and this memory size is directly dependent on the size of subtree (number of nodes). In result of this in some cases it is impossible to make such operations in one step. In our environment sometimes 1 GB of java heap is not enough to succesfully clone subtree from one workspace to another.

      You can always clone (copy, remove) tree in chunks, but if you have references between subtrees such approach fails. Possibilty of temporary disabling referential integrity checking for experienced JCR user could be very usefull then.

      Another use case is to allow to clone selected subtrees of the whole structure between worskpaces. In our application we need to clone only some selected subtrees from one workspace to another. But we can not do that because of existing references. We need to clone the whol estructure first, then remove all unwanted nodes, which is really time expensive and memory consuming.

      1. JCR-954-simple.diff
        2 kB
        Przemo Pakulski
      2. JCR-954-patch.txt
        13 kB
        Przemo Pakulski

        Activity

        Jukka Zitting made changes -
        Workflow jira [ 12405182 ] no-reopen-closed, patch-avail [ 12468350 ]
        Jukka Zitting made changes -
        Status Resolved [ 5 ] Closed [ 6 ]
        Jukka Zitting made changes -
        Status Open [ 1 ] Resolved [ 5 ]
        Assignee Jukka Zitting [ jukkaz ]
        Resolution Fixed [ 1 ]
        Hide
        Jukka Zitting added a comment -

        Committed the SISM changes and the proposed protected RepositoryImpl method to trunk in revision 632738. Merged the changes to the 1.3 branch in revision 632739.

        I'm reluctant to push this to the 1.4 branch at the moment as we're still making "pure" patch releases from there. Perhaps once 1.5 is out we can do a more relaxed 1.4.x release like we are currently doing with 1.3.4.

        Show
        Jukka Zitting added a comment - Committed the SISM changes and the proposed protected RepositoryImpl method to trunk in revision 632738. Merged the changes to the 1.3 branch in revision 632739. I'm reluctant to push this to the 1.4 branch at the moment as we're still making "pure" patch releases from there. Perhaps once 1.5 is out we can do a more relaxed 1.4.x release like we are currently doing with 1.3.4.
        Jukka N committed 632738 (2 files)
        Reviews: none

        JCR-954: Allow to disable referential integrity checking for workspace
            - Added protected RepositoryImpl.setReferentialIntegrityChecking method
              and supporting functionality in SharedItemStateManager
            - Based on patches by Przemo Pakulski

        Hide
        Przemo Pakulski added a comment -

        Could we also include patch in 1.4 maintenance branch ?

        Show
        Przemo Pakulski added a comment - Could we also include patch in 1.4 maintenance branch ?
        Hide
        Przemo Pakulski added a comment -

        +1, that will ensure that all internal classes (including SISM) and methods will be acessible in future relases

        Show
        Przemo Pakulski added a comment - +1, that will ensure that all internal classes (including SISM) and methods will be acessible in future relases
        Hide
        Stefan Guggisberg added a comment -

        > Would it be OK to have a protected method for this in RepositoryImpl? That would still require a subclass, but would minimize the amount of coupling that such a subclass needs with Jackrabbit internals.

        +1, that would be fine with me. thanks!

        Show
        Stefan Guggisberg added a comment - > Would it be OK to have a protected method for this in RepositoryImpl? That would still require a subclass, but would minimize the amount of coupling that such a subclass needs with Jackrabbit internals. +1, that would be fine with me. thanks!
        Hide
        Jukka Zitting added a comment -

        Would it be OK to have a protected method for this in RepositoryImpl? That would still require a subclass, but would minimize the amount of coupling that such a subclass needs with Jackrabbit internals.

        Show
        Jukka Zitting added a comment - Would it be OK to have a protected method for this in RepositoryImpl? That would still require a subclass, but would minimize the amount of coupling that such a subclass needs with Jackrabbit internals.
        Hide
        Stefan Guggisberg added a comment -

        > How about including the method in a custom RepositoryImpl subclass included in o.a.j.core? This way nobody would be affected by default, but if you really needed this feature you could instantiate and use that subclass instead of the normal RepositoryImpl.

        -0

        IMO that would be still too easy and tempting to use. people might start using this 'feature' because they expect better performance.
        however, unless they know exactly what they're doing, they risk corrupting the repository. personally i'd prefer to expose this
        functionality to subclasses but people would have to write their own in order to enable it.

        Show
        Stefan Guggisberg added a comment - > How about including the method in a custom RepositoryImpl subclass included in o.a.j.core? This way nobody would be affected by default, but if you really needed this feature you could instantiate and use that subclass instead of the normal RepositoryImpl. -0 IMO that would be still too easy and tempting to use. people might start using this 'feature' because they expect better performance. however, unless they know exactly what they're doing, they risk corrupting the repository. personally i'd prefer to expose this functionality to subclasses but people would have to write their own in order to enable it.
        Hide
        Jukka Zitting added a comment -

        How about including the method in a custom RepositoryImpl subclass included in o.a.j.core? This way nobody would be affected by default, but if you really needed this feature you could instantiate and use that subclass instead of the normal RepositoryImpl.

        Show
        Jukka Zitting added a comment - How about including the method in a custom RepositoryImpl subclass included in o.a.j.core? This way nobody would be affected by default, but if you really needed this feature you could instantiate and use that subclass instead of the normal RepositoryImpl.
        Hide
        Stefan Guggisberg added a comment -

        i'd prefer to not publicly expose this method. doing so would IMO mean compromising a core feature of JSR-170.

        Show
        Stefan Guggisberg added a comment - i'd prefer to not publicly expose this method. doing so would IMO mean compromising a core feature of JSR-170.
        Hide
        Jukka Zitting added a comment -

        I'd be OK to include the setCheckIntegrityEnabled method in RepositoryImpl. It's one less customization needed on your side and other people might also find it useful (I know I would every now and then).

        Show
        Jukka Zitting added a comment - I'd be OK to include the setCheckIntegrityEnabled method in RepositoryImpl. It's one less customization needed on your side and other people might also find it useful (I know I would every now and then).
        Hide
        Przemo Pakulski added a comment -

        I first proposed patch I've added the following method to RepositoryImpl class :

        • Enables/disables referential integrity check for workspace.
        • @param workspaceName
        • @param checkIntegrityEnabled
        • @throws RepositoryException
          */
          public void setCheckIntegrityEnabled(String workspaceName, boolean checkIntegrityEnabled) throws RepositoryException;

        To keep the patch simple, I created own wrapper for RepositorImpl class and moved this method to custom RepositoryImpl implementation, so I'm able to enable/disable referential integrity checking for any workspace programatically.

        Show
        Przemo Pakulski added a comment - I first proposed patch I've added the following method to RepositoryImpl class : Enables/disables referential integrity check for workspace. @param workspaceName @param checkIntegrityEnabled @throws RepositoryException */ public void setCheckIntegrityEnabled(String workspaceName, boolean checkIntegrityEnabled) throws RepositoryException; To keep the patch simple, I created own wrapper for RepositorImpl class and moved this method to custom RepositoryImpl implementation, so I'm able to enable/disable referential integrity checking for any workspace programatically.
        Jukka Zitting made changes -
        Affects Version/s 1.3.3 [ 12312770 ]
        Affects Version/s 1.4 [ 12312447 ]
        Affects Version/s 1.5 [ 12312920 ]
        Affects Version/s 1.4.1 [ 12312919 ]
        Fix Version/s 1.4.1 [ 12312919 ]
        Hide
        Jukka Zitting added a comment -

        Agreed with Przemo. There's a real itch here and the proposed fix won't harm anyone, so unless someone comes up with another way to fix this we should not stand in the way.

        About the fix, how would you enable the no-referential-checks mode in practice? Should we add some route up to the RepositoryImpl or WorkspaceImpl level, or should the checkReferences flag perhaps be static?

        Show
        Jukka Zitting added a comment - Agreed with Przemo. There's a real itch here and the proposed fix won't harm anyone, so unless someone comes up with another way to fix this we should not stand in the way. About the fix, how would you enable the no-referential-checks mode in practice? Should we add some route up to the RepositoryImpl or WorkspaceImpl level, or should the checkReferences flag perhaps be static?
        Przemo Pakulski made changes -
        Attachment JCR-954-simple.diff [ 12376560 ]
        Hide
        Przemo Pakulski added a comment -

        Simpler version of patch attached :

        • no public api method,
        • no changes in functionality (integrity checking enabled by default),
        • just single flag with the setter in SharedItemStateManager class which allow to control this behaviour programatically by experienced developers.
        Show
        Przemo Pakulski added a comment - Simpler version of patch attached : no public api method, no changes in functionality (integrity checking enabled by default), just single flag with the setter in SharedItemStateManager class which allow to control this behaviour programatically by experienced developers.
        Hide
        Przemo Pakulski added a comment -

        Without such option it is not possible to clone, import neither remove relatively big subtrees of nodes at all.
        I really need such functionality, but nobody even tried to address the real issue since 9 months already.

        Show
        Przemo Pakulski added a comment - Without such option it is not possible to clone, import neither remove relatively big subtrees of nodes at all. I really need such functionality, but nobody even tried to address the real issue since 9 months already.
        Przemo Pakulski made changes -
        Affects Version/s 1.4.1 [ 12312919 ]
        Affects Version/s 1.5 [ 12312920 ]
        Priority Major [ 3 ] Minor [ 4 ]
        Affects Version/s 1.3.3 [ 12312770 ]
        Affects Version/s 1.4 [ 12312447 ]
        Fix Version/s 1.5 [ 12312920 ]
        Fix Version/s 1.3.4 [ 12312918 ]
        Fix Version/s 1.4.1 [ 12312919 ]
        Jukka Zitting made changes -
        Fix Version/s 1.4 [ 12312447 ]
        Hide
        Jukka Zitting added a comment -

        Dropping from 1.4. I think we definitely should support Przemo's use case, but with two -1s on table we either need more discussion or an alternative proposal for implementing this.

        Show
        Jukka Zitting added a comment - Dropping from 1.4. I think we definitely should support Przemo's use case, but with two -1s on table we either need more discussion or an alternative proposal for implementing this.
        Hide
        Przemo Pakulski added a comment -

        I can agree that such feature is rather workaround, but this solution is quick and really works. As I wrote it will be not recommended to use it, it is only for experience JCR users which will be aware of any possible drawbacks.

        From other point of view using any relational database you can also temporarly disable constraint checking to speedup some bulk operations, then enable it again. And in some cases it is very helpfull.

        >references are a core feature of jsr-170 which imo must not be compromised through public api methods.

        I dot't need it exposed through public API. What I need is to have some methods which I can call on any component (could be RepositoryImpl, or WorkspaceImpl).

        For now we have implemented this by extending SISM class and overriding some methods. But then our code is depenedent on Jackrabbit, and could stop work with newest versions.

        >Consider a big subtree of items (1 mio items, eg.), which you might want to delete. Just switching off integrity checks does not help here

        I think it helps, because you can remove tree in steps without worrying about references.

        >we should address the real issue instead.

        I really agree with you, but changing this could mean redesigning of jackrabbit core and it does not look that it could happen in the near future.

        Stefan, Felix, could you recommend any other feasible solution for my use cases?

        Show
        Przemo Pakulski added a comment - I can agree that such feature is rather workaround, but this solution is quick and really works. As I wrote it will be not recommended to use it, it is only for experience JCR users which will be aware of any possible drawbacks. From other point of view using any relational database you can also temporarly disable constraint checking to speedup some bulk operations, then enable it again. And in some cases it is very helpfull. >references are a core feature of jsr-170 which imo must not be compromised through public api methods. I dot't need it exposed through public API. What I need is to have some methods which I can call on any component (could be RepositoryImpl, or WorkspaceImpl). For now we have implemented this by extending SISM class and overriding some methods. But then our code is depenedent on Jackrabbit, and could stop work with newest versions. >Consider a big subtree of items (1 mio items, eg.), which you might want to delete. Just switching off integrity checks does not help here I think it helps, because you can remove tree in steps without worrying about references. >we should address the real issue instead. I really agree with you, but changing this could mean redesigning of jackrabbit core and it does not look that it could happen in the near future. Stefan, Felix, could you recommend any other feasible solution for my use cases?
        Hide
        Jukka Zitting added a comment -

        > we should address the real issue instead.

        Do we have a plan or even a vague idea of how and when we are going to solve that? In fact I do have some ideas in NGP for solving that, but they are way off in the future.

        I'm all for going for the root cause, but who will do it?

        Show
        Jukka Zitting added a comment - > we should address the real issue instead. Do we have a plan or even a vague idea of how and when we are going to solve that? In fact I do have some ideas in NGP for solving that, but they are way off in the future. I'm all for going for the root cause, but who will do it?
        Hide
        Stefan Guggisberg added a comment -

        the suggested solution is imo a hack to enable a workaround for the real issue at hand, i.e. in-memory changelog & transient changes

        we should address the real issue instead.

        Show
        Stefan Guggisberg added a comment - the suggested solution is imo a hack to enable a workaround for the real issue at hand, i.e. in-memory changelog & transient changes we should address the real issue instead.
        Hide
        Felix Meschberger added a comment -

        -1 for this change.

        I second the opinion by Stefan, that this would be a very bad idea.

        In fact the real issue is the transient items space which is growing due to the "big transaction". This is a big issue of the internal implementation of the item managers and cannot be solved by just switching off integrity checking. Consider a big subtree of items (1 mio items, eg.), which you might want to delete. Just switching off integrity checks does not help here.

        Show
        Felix Meschberger added a comment - -1 for this change. I second the opinion by Stefan, that this would be a very bad idea. In fact the real issue is the transient items space which is growing due to the "big transaction". This is a big issue of the internal implementation of the item managers and cannot be solved by just switching off integrity checking. Consider a big subtree of items (1 mio items, eg.), which you might want to delete. Just switching off integrity checks does not help here.
        Hide
        Jukka Zitting added a comment -

        Disabling the checks seems to be the only way for now to really achieve the use cases in question, so I wouldn't just deny this change. However, could we end up with some real internal issues for example if the NodeReferences structures inside a persistence store become incorrect?

        I wouldn't worry too much about inconsistencies visible to the client application(s) if they were knowingly injected by a client, but it is a problem if such inconsistencies would destabilize the repository itself.

        Show
        Jukka Zitting added a comment - Disabling the checks seems to be the only way for now to really achieve the use cases in question, so I wouldn't just deny this change. However, could we end up with some real internal issues for example if the NodeReferences structures inside a persistence store become incorrect? I wouldn't worry too much about inconsistencies visible to the client application(s) if they were knowingly injected by a client, but it is a problem if such inconsistencies would destabilize the repository itself.
        Hide
        Stefan Guggisberg added a comment -

        -1 for the suggested feature since it might lead to inconsistent data.

        references are a core feature of jsr-170 which imo must not be compromised through public api methods.

        Show
        Stefan Guggisberg added a comment - -1 for the suggested feature since it might lead to inconsistent data. references are a core feature of jsr-170 which imo must not be compromised through public api methods.
        Przemo Pakulski made changes -
        Field Original Value New Value
        Attachment JCR-954-patch.txt [ 12358574 ]
        Hide
        Przemo Pakulski added a comment -

        Attached patch containing simple solution in SharedItemStateManager class.
        You can disable/enable referential integrity checking by simply setting flag on workspace :

        ((JackrabbitWorkspace)workspace.setReferentialIntegrityChecking(false);

        ((JackrabbitWorkspace)workspace.setReferentialIntegrityChecking(true);

        Show
        Przemo Pakulski added a comment - Attached patch containing simple solution in SharedItemStateManager class. You can disable/enable referential integrity checking by simply setting flag on workspace : ((JackrabbitWorkspace)workspace.setReferentialIntegrityChecking(false); ((JackrabbitWorkspace)workspace.setReferentialIntegrityChecking(true);
        Przemo Pakulski created issue -

          People

          • Assignee:
            Jukka Zitting
            Reporter:
            Przemo Pakulski
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development