Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.5
    • Component/s: clustering, jackrabbit-core
    • Labels:
      None
    • Environment:
      A clustered Jackrabbit

      Description

      The revision table in which cluster nodes write their changes can potentially become very large. If all cluster nodes are up to date to a certain revision number, then it seems unnecessary to keep the revisions with a lower number.

      1. JCR-1087-v2.patch
        37 kB
        Martijn Hendriks
      2. JCR-1087.patch
        26 kB
        Martijn Hendriks
      3. cluster-trace.txt
        2 kB
        Martijn Hendriks

        Issue Links

          Activity

          Hide
          Martijn Hendriks added a comment -

          I'm afraid there's no documentation yet. I'll try to add it soon.

          Show
          Martijn Hendriks added a comment - I'm afraid there's no documentation yet. I'll try to add it soon.
          Hide
          Thomas Mueller added a comment -

          I couldn't find any documentation for this feature at: http://wiki.apache.org/jackrabbit/Clustering

          Is there any documentation? So far I only added a link to here (Removing Old Revisions)

          Show
          Thomas Mueller added a comment - I couldn't find any documentation for this feature at: http://wiki.apache.org/jackrabbit/Clustering Is there any documentation? So far I only added a link to here (Removing Old Revisions)
          Hide
          Martijn Hendriks added a comment -

          Committed in revision 628697.

          The instance revision on the local file system is automatically migrated to the database (to the LOCAL_REVISIONS) table. The clean-up thread is not started by default.

          Known caveats of the current solution:

          • The user must make sure that all cluster nodes have written their local revision to the database before the clean-up thread runs for the first time because otherwise cluster nodes might miss updates (because they have been purged) and their local caches and search-indexes get out of sync.
          • If a cluster node is removed permanently from the cluster, then its entry in the LOCAL_REVISIONS table should be removed manually. Otherwise, the clean-up thread will not be effective.
          Show
          Martijn Hendriks added a comment - Committed in revision 628697. The instance revision on the local file system is automatically migrated to the database (to the LOCAL_REVISIONS) table. The clean-up thread is not started by default. Known caveats of the current solution: The user must make sure that all cluster nodes have written their local revision to the database before the clean-up thread runs for the first time because otherwise cluster nodes might miss updates (because they have been purged) and their local caches and search-indexes get out of sync. If a cluster node is removed permanently from the cluster, then its entry in the LOCAL_REVISIONS table should be removed manually. Otherwise, the clean-up thread will not be effective.
          Hide
          Martijn Hendriks added a comment -

          Hi all,

          Unfortunately I've been inactive for a while but now i've more time to work on Jackrabbit which is good . I created a second patch for this issue which also addresses the upgrade scenario that Dominique mentioned:

          • Added the LOCAL_REVISIONS table to the create scripts (*.ddl)
          • Added InstanceRevision interface
          • The InstanceRevision is now retrieved through the Journal instance
          • Added logic to the DatabaseJournal to migrate to a db based InstanceRevision,
            and (ii) start a janitor thread for cleaning up old cluster revision entries

          I've tested the patch only on MSSQL, MySQL and Oracle, because I don't have access to the other databases.

          I don't really like the solution for the upgrade scenario (a ddl is scanned for the line that creates the LOCAL_REVISIONS table), but I like the alternative of having twice as many .ddl files even less. But maybe there's a third way...?

          Best regards, Martijn

          Show
          Martijn Hendriks added a comment - Hi all, Unfortunately I've been inactive for a while but now i've more time to work on Jackrabbit which is good . I created a second patch for this issue which also addresses the upgrade scenario that Dominique mentioned: Added the LOCAL_REVISIONS table to the create scripts (*.ddl) Added InstanceRevision interface The InstanceRevision is now retrieved through the Journal instance Added logic to the DatabaseJournal to migrate to a db based InstanceRevision, and (ii) start a janitor thread for cleaning up old cluster revision entries I've tested the patch only on MSSQL, MySQL and Oracle, because I don't have access to the other databases. I don't really like the solution for the upgrade scenario (a ddl is scanned for the line that creates the LOCAL_REVISIONS table), but I like the alternative of having twice as many .ddl files even less. But maybe there's a third way...? Best regards, Martijn
          Hide
          Martijn Hendriks added a comment -

          Hi Dominique,

          Good point! When this patch is applied to a Jackrabbit installation that already uses the clustering feature it will break if the LOCAL_REVISIONS table is not added manually. I'll look into this.

          Martijn

          Show
          Martijn Hendriks added a comment - Hi Dominique, Good point! When this patch is applied to a Jackrabbit installation that already uses the clustering feature it will break if the LOCAL_REVISIONS table is not added manually. I'll look into this. Martijn
          Hide
          Dominique Pfister added a comment -

          Hi Martijn,

          your patch looks good to me, so please go ahead and submit it. One nice thing that might be required for people already owning a database journal: is there a way to easily detect whether the LOCAL_REVISIONS table is missing and to tell the user to upgrade their schema?

          Cheers
          Dominique

          Show
          Dominique Pfister added a comment - Hi Martijn, your patch looks good to me, so please go ahead and submit it. One nice thing that might be required for people already owning a database journal: is there a way to easily detect whether the LOCAL_REVISIONS table is missing and to tell the user to upgrade their schema? Cheers Dominique
          Hide
          Martijn Hendriks added a comment -

          Attached is a patch for this issue. When a DatabaseJournal is used, the local revisions are also stored in the database instead of on the local file system. This information can then be used for periodic clean-ups of the JOURNAL table which may become very large. Note that this only works if all JR information except for the search index is stored in the database. The clean-up thread is disabled by default.

          Please comment. Thanks!

          Show
          Martijn Hendriks added a comment - Attached is a patch for this issue. When a DatabaseJournal is used, the local revisions are also stored in the database instead of on the local file system. This information can then be used for periodic clean-ups of the JOURNAL table which may become very large. Note that this only works if all JR information except for the search index is stored in the database. The clean-up thread is disabled by default. Please comment. Thanks!
          Hide
          Martijn Hendriks added a comment -

          The resolution of JCR-905 allows us to remove all unnecessary revision data. I.e., the minimum of all local revisions of the clusternodes gives an upperbound on the revisions that can safely be removed from the database.

          A solution for this issue would be to add a periodic task that removes all unnecessary revisions:

          • All clusternodes should add their local revision to the database.
          • Add a configuration option in the repository.xml to let one of the clusternodes execute the cleanup task (i.e., period and offset such as "every night at 00:00 hours").
          Show
          Martijn Hendriks added a comment - The resolution of JCR-905 allows us to remove all unnecessary revision data. I.e., the minimum of all local revisions of the clusternodes gives an upperbound on the revisions that can safely be removed from the database. A solution for this issue would be to add a periodic task that removes all unnecessary revisions: All clusternodes should add their local revision to the database. Add a configuration option in the repository.xml to let one of the clusternodes execute the cleanup task (i.e., period and offset such as "every night at 00:00 hours").
          Hide
          Martijn Hendriks added a comment -

          When the cluster revision table becomes too large a cluster node without search index and local revision number cannot be started due to memory problems (see attached stacktrace).

          Show
          Martijn Hendriks added a comment - When the cluster revision table becomes too large a cluster node without search index and local revision number cannot be started due to memory problems (see attached stacktrace).
          Hide
          Martijn Hendriks added a comment -

          I'm linking this issue to JCR-905 because the solution of that issue has an impact on whether or not all revisions should be kept.

          Show
          Martijn Hendriks added a comment - I'm linking this issue to JCR-905 because the solution of that issue has an impact on whether or not all revisions should be kept.

            People

            • Assignee:
              Unassigned
              Reporter:
              Martijn Hendriks
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development