Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Blocker Blocker
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.4
    • Component/s: jackrabbit-core
    • Labels:
      None

      Description

      Currently the data store garbage collection needs to be run manually. It should be simpler to use (maybe tool based), or automatic.

        Activity

        Hide
        Thomas Mueller added a comment -

        Revision 576314: added a test case and fixed a bug: The garbage collecter deleted too many files when not closing the repository first

        Show
        Thomas Mueller added a comment - Revision 576314: added a test case and fixed a bug: The garbage collecter deleted too many files when not closing the repository first
        Hide
        Thomas Mueller added a comment -

        To better support garbage collection for the data store, I suggest to add a new method to AbstractBundlePersistenceManager:

        /**

        • Get all node ids.
        • A typical application will call this method multiple times, where 'after'
        • is the last row read. The maxCount parameter defines the maximum number of
        • node ids returned, 0 meaning no limit. The order of the node ids is specific for the
        • given persistent manager. Items that are added concurrently may not be included.
        • @param after the lower limit, or null for no limit.
        • @param maxCount the maximum number of node ids to return, or 0 for no limit.
        • @return an iterator of all bundles.
        • @throws ItemStateException if an error while loading occurs.
          */
          public abstract NodeIdIterator getAllNodeIds(NodeId after, int maxCount)
          throws ItemStateException;

        Only for the Bundle PersistenceManagers, because those persistence managers are the most important ones (in my view).

        This method is then called from the garbage collection process (or from a background thread from time to time, with a low maxCount and with enough sleep time in between). After all nodes are processed, the objects in the data store that were never scanned are deleted. This mechanism is better than the current mechanism as it can be restarted: only the last visited node needs to be persisted. It is also more efficient as the persistence manager can return the data in the order it is stored (which is easy for BundleFsPersistenceManager).

        What do you think, is this approach OK?
        Thomas

        Show
        Thomas Mueller added a comment - To better support garbage collection for the data store, I suggest to add a new method to AbstractBundlePersistenceManager: /** Get all node ids. A typical application will call this method multiple times, where 'after' is the last row read. The maxCount parameter defines the maximum number of node ids returned, 0 meaning no limit. The order of the node ids is specific for the given persistent manager. Items that are added concurrently may not be included. @param after the lower limit, or null for no limit. @param maxCount the maximum number of node ids to return, or 0 for no limit. @return an iterator of all bundles. @throws ItemStateException if an error while loading occurs. */ public abstract NodeIdIterator getAllNodeIds(NodeId after, int maxCount) throws ItemStateException; Only for the Bundle PersistenceManagers, because those persistence managers are the most important ones (in my view). This method is then called from the garbage collection process (or from a background thread from time to time, with a low maxCount and with enough sleep time in between). After all nodes are processed, the objects in the data store that were never scanned are deleted. This mechanism is better than the current mechanism as it can be restarted: only the last visited node needs to be persisted. It is also more efficient as the persistence manager can return the data in the order it is stored (which is easy for BundleFsPersistenceManager). What do you think, is this approach OK? Thomas
        Hide
        Marcel Reutegger added a comment -

        > The order of the node ids is specific for the given persistent manager.

        Doesn't this mean that an implementation has to retain the order of the bundles it stores? E.g. for a very simple in memory PM you would have to use a LinkedHashMap, because a HashMap doesn't work.

        Show
        Marcel Reutegger added a comment - > The order of the node ids is specific for the given persistent manager. Doesn't this mean that an implementation has to retain the order of the bundles it stores? E.g. for a very simple in memory PM you would have to use a LinkedHashMap, because a HashMap doesn't work.
        Hide
        Thomas Mueller added a comment -

        It does not need to be the insertion order, it can be any order. For BundleFsPersistenceManager the easiest approach would be sort by UUID. For BundleDbPersistenceManager it could be UUID as well, or anything where an index exists. For an in-memory persistence manager HashMap would not always work because the table could be resized. LinkedHashMap would work but not efficiently (you can't get an iterator starting from a certain point). SortedMap would be better (using the tailMap method).

        Show
        Thomas Mueller added a comment - It does not need to be the insertion order, it can be any order. For BundleFsPersistenceManager the easiest approach would be sort by UUID. For BundleDbPersistenceManager it could be UUID as well, or anything where an index exists. For an in-memory persistence manager HashMap would not always work because the table could be resized. LinkedHashMap would work but not efficiently (you can't get an iterator starting from a certain point). SortedMap would be better (using the tailMap method).
        Hide
        Thomas Mueller added a comment -

        Revision 577297: Add AbstractBundlePersistenceManager.getAllNodeIds

        Show
        Thomas Mueller added a comment - Revision 577297: Add AbstractBundlePersistenceManager.getAllNodeIds
        Hide
        Martijn Hendriks added a comment -

        I used the MSSQL bundle PM for the unit tests, and found that the test added last wednesday fails for this PM. For the default derby PM it does not fail.

        testGetAllNodeIds(org.apache.jackrabbit.core.data.PersistenceManagerIteratorTest) Time elapsed: 0.25 sec <<< FAILURE!
        junit.framework.AssertionFailedError: expected:<-1> but was:<1>
        at junit.framework.Assert.fail(Assert.java:47)
        at junit.framework.Assert.failNotEquals(Assert.java:282)
        at junit.framework.Assert.assertEquals(Assert.java:64)
        at junit.framework.Assert.assertEquals(Assert.java:201)
        at junit.framework.Assert.assertEquals(Assert.java:207)
        at org.apache.jackrabbit.core.data.PersistenceManagerIteratorTest.testGetAllNodeIds(PersistenceManagerIteratorTest.java:81)
        at org.apache.jackrabbit.core.data.PersistenceManagerIteratorTest.testGetAllNodeIds(PersistenceManagerIteratorTest.java:81)

        Show
        Martijn Hendriks added a comment - I used the MSSQL bundle PM for the unit tests, and found that the test added last wednesday fails for this PM. For the default derby PM it does not fail. testGetAllNodeIds(org.apache.jackrabbit.core.data.PersistenceManagerIteratorTest) Time elapsed: 0.25 sec <<< FAILURE! junit.framework.AssertionFailedError: expected:<-1> but was:<1> at junit.framework.Assert.fail(Assert.java:47) at junit.framework.Assert.failNotEquals(Assert.java:282) at junit.framework.Assert.assertEquals(Assert.java:64) at junit.framework.Assert.assertEquals(Assert.java:201) at junit.framework.Assert.assertEquals(Assert.java:207) at org.apache.jackrabbit.core.data.PersistenceManagerIteratorTest.testGetAllNodeIds(PersistenceManagerIteratorTest.java:81) at org.apache.jackrabbit.core.data.PersistenceManagerIteratorTest.testGetAllNodeIds(PersistenceManagerIteratorTest.java:81)
        Hide
        Thomas Mueller added a comment -

        Thanks for reporting this! Silly me, I did not follow my own definition of next():
        "The order of the node ids is specific for the given persistent manager."
        So I can't just assert ascending order of UUIDs...
        Still strange that it seems to work with other databases. I will test that.

        Show
        Thomas Mueller added a comment - Thanks for reporting this! Silly me, I did not follow my own definition of next(): "The order of the node ids is specific for the given persistent manager." So I can't just assert ascending order of UUIDs... Still strange that it seems to work with other databases. I will test that.
        Hide
        Thomas Mueller added a comment -

        Hi Martijn,
        If you still have the jcr.log file (usually this is append-only), could you open it and find the place where the test faild with MS SQL Server? The test writes debug info, and I like to check if it really orders the other way. With Derby, I get:

        ...
        f8425869-0c8f-4e1a-a3e3-d0631d895e34
        f8c06d0c-2233-4e65-bbe3-d5352df9eb93
        072a60c1-0f11-4b96-9dbc-bbe393611869
        16dba3fe-5521-449f-8f35-a4011e5bf3a6
        2093e0c0-d0e8-4282-a0d1-955739278bb8
        ...

        I guess MS SQL Server orders like this:
        072a60c1-0f11-4b96-9dbc-bbe393611869
        16dba3fe-5521-449f-8f35-a4011e5bf3a6
        2093e0c0-d0e8-4282-a0d1-955739278bb8
        ...
        f8425869-0c8f-4e1a-a3e3-d0631d895e34
        f8c06d0c-2233-4e65-bbe3-d5352df9eb93

        Thanks!
        Thomas

        Show
        Thomas Mueller added a comment - Hi Martijn, If you still have the jcr.log file (usually this is append-only), could you open it and find the place where the test faild with MS SQL Server? The test writes debug info, and I like to check if it really orders the other way. With Derby, I get: ... f8425869-0c8f-4e1a-a3e3-d0631d895e34 f8c06d0c-2233-4e65-bbe3-d5352df9eb93 072a60c1-0f11-4b96-9dbc-bbe393611869 16dba3fe-5521-449f-8f35-a4011e5bf3a6 2093e0c0-d0e8-4282-a0d1-955739278bb8 ... I guess MS SQL Server orders like this: 072a60c1-0f11-4b96-9dbc-bbe393611869 16dba3fe-5521-449f-8f35-a4011e5bf3a6 2093e0c0-d0e8-4282-a0d1-955739278bb8 ... f8425869-0c8f-4e1a-a3e3-d0631d895e34 f8c06d0c-2233-4e65-bbe3-d5352df9eb93 Thanks! Thomas
        Hide
        Thomas Mueller added a comment -

        Revision 583413: The garbage collection deleted transient objects if the call sequence was:

        Session 1: add a blob (still in transient space)
        Session 2: start garbage collection
        Session 2: garbage collection completed, delete unused objects
        Session 1: store the changes

        This change adds a WeakReference map to make sure transient objects are not deleted.
        A test case is included.

        Show
        Thomas Mueller added a comment - Revision 583413: The garbage collection deleted transient objects if the call sequence was: Session 1: add a blob (still in transient space) Session 2: start garbage collection Session 2: garbage collection completed, delete unused objects Session 1: store the changes This change adds a WeakReference map to make sure transient objects are not deleted. A test case is included.
        Hide
        Pablo Rios added a comment -

        Shouldn't transient objects be removed from the WeakReference map when they are persisted ?
        Could this responsibility be added to the AbstractBundlePersistenceManager and called from the store(ChangeLog) method ?

        Show
        Pablo Rios added a comment - Shouldn't transient objects be removed from the WeakReference map when they are persisted ? Could this responsibility be added to the AbstractBundlePersistenceManager and called from the store(ChangeLog) method ?
        Hide
        Thomas Mueller added a comment -

        Transient objects are actually removed: when garbage collected.
        That's why a WeakReference map is used and not a regular map.
        There is no need to write code to remove them.
        Or is there a problem with this approach?

        Show
        Thomas Mueller added a comment - Transient objects are actually removed: when garbage collected. That's why a WeakReference map is used and not a regular map. There is no need to write code to remove them. Or is there a problem with this approach?
        Hide
        Pablo Rios added a comment -

        >Or is there a problem with this approach?
        No, everything works fine !

        Show
        Pablo Rios added a comment - >Or is there a problem with this approach? No, everything works fine !
        Hide
        Jukka Zitting added a comment -

        What's the status of this issue, i.e. any chance of resolving this soon?

        Show
        Jukka Zitting added a comment - What's the status of this issue, i.e. any chance of resolving this soon?
        Hide
        Thomas Mueller added a comment -

        I will check in my changes today. In my view this item is then resolved.

        Show
        Thomas Mueller added a comment - I will check in my changes today. In my view this item is then resolved.
        Hide
        Thomas Mueller added a comment -

        With revision 604872, the code to run the data store garbage collection is:

        GarbageCollector gc = ((SessionImpl)session).createDataStoreGarbageCollector();
        gc.scan();
        gc.stopScan();
        gc.deleteUnused();

        The garbage collector will iterate over all nodes in the repository and update the last modified date. If all persistence managers implement the IterablePersistenceManager interface, this mechanism will be used; if not, the garbage collector will scan the repository using the JCR API starting from the root node.

        Show
        Thomas Mueller added a comment - With revision 604872, the code to run the data store garbage collection is: GarbageCollector gc = ((SessionImpl)session).createDataStoreGarbageCollector(); gc.scan(); gc.stopScan(); gc.deleteUnused(); The garbage collector will iterate over all nodes in the repository and update the last modified date. If all persistence managers implement the IterablePersistenceManager interface, this mechanism will be used; if not, the garbage collector will scan the repository using the JCR API starting from the root node.
        Hide
        Thomas Mueller added a comment -

        Implemented in revision 604881

        Show
        Thomas Mueller added a comment - Implemented in revision 604881

          People

          • Assignee:
            Thomas Mueller
            Reporter:
            Thomas Mueller
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development