Solr
  1. Solr
  2. SOLR-617

Allow configurable deletion policy

    Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: 1.4
    • Fix Version/s: 1.4
    • Component/s: search, update
    • Labels:
      None

      Description

      Lucene API provides means to configure deletion policy. Solr should be able to expose it through configuration in solrconfig.xml. Moreover the new replication (SOLR-561) strategy is going to rely on this .

      I propose the configuration go into the <mainIndex> section

      sample configuration

      solrconfig.xml
      <mainIndex>
          <!-- configure deletion policy here-->
          <deletionPolicy>
             <!-- Store only the commits with optimize.Non optimized commits will get deleted by lucene when 
                     the last IndexWriter/IndexReader using this commit point is closed  -->
              <keepOptimizedOnly>true</keepOptimizedOnly>
               <!--Maximum no: of commit points stored . Older ones will be cleaned when they go out of scope-->
              <maxCommitsToKeep></maxCommitsToKeep>
               <!-- max age of a stored commit-->
              <maxCommitAge></maxCommitAge>    
          </deletionPolicy>
          
        </mainIndex>
      
      1. 617.patch
        19 kB
        Akshay K. Ukey
      2. solr-617.patch
        65 kB
        Shalin Shekhar Mangar
      3. solr-617.patch
        54 kB
        Akshay K. Ukey
      4. solr-617.patch
        55 kB
        Akshay K. Ukey
      5. solr-617.patch
        18 kB
        Akshay K. Ukey
      6. solr-617.patch
        19 kB
        Akshay K. Ukey

        Issue Links

          Activity

          Hide
          Akshay K. Ukey added a comment -

          This patch adds support for the configuration described in the issue description.

          Show
          Akshay K. Ukey added a comment - This patch adds support for the configuration described in the issue description.
          Hide
          Shalin Shekhar Mangar added a comment - - edited

          Thanks for the patch Akshay!

          I think we should allow a user to specify his custom IndexDeletionPolicy too. We can provide a default implementation with all the options specified in the issue description. So I propose that we have the following syntax:

          <deletionPolicy class="com.MyDeletionPolicy" />
          

          The default implementation will be SolrIndexDeletionPolicy which can be configured through a NamedList. Any custom deletion policy will be initialized with a NamedList if it implements NamedListInitializedPlugin.

          <!-- configure deletion policy here-->
              <deletionPolicy class="solr.SolrIndexDeletionPolicy">
                <!--  Store only the commits with optimize.Non optimized commits will get deleted by lucene when
                      the last IndexWriter/IndexReader using this commit point is closed  -->
                <str name="keepOptimizedOnly">true</str>
                <!--Maximum no: of commit points stored . Older ones will be cleaned when they go out of scope-->
                <str name="maxCommitsToKeep"></str>
                <!-- max age of a stored commit-->
                <str name="maxCommitAge"></str>
              </deletionPolicy>
          

          To facilitate replication, we can have a wrapper over the IndexDeletionPolicy which can provide us the features needed for replication (SOLR-561). We need access to a list of non-deleted IndexCommit instances, a way to lookup IndexCommit given a version as well as the latest commit point.

          Show
          Shalin Shekhar Mangar added a comment - - edited Thanks for the patch Akshay! I think we should allow a user to specify his custom IndexDeletionPolicy too. We can provide a default implementation with all the options specified in the issue description. So I propose that we have the following syntax: <deletionPolicy class= "com.MyDeletionPolicy" /> The default implementation will be SolrIndexDeletionPolicy which can be configured through a NamedList. Any custom deletion policy will be initialized with a NamedList if it implements NamedListInitializedPlugin. <!-- configure deletion policy here--> <deletionPolicy class= "solr.SolrIndexDeletionPolicy" > <!-- Store only the commits with optimize.Non optimized commits will get deleted by lucene when the last IndexWriter/IndexReader using this commit point is closed --> <str name= "keepOptimizedOnly" > true </str> <!--Maximum no: of commit points stored . Older ones will be cleaned when they go out of scope--> <str name= "maxCommitsToKeep" > </str> <!-- max age of a stored commit--> <str name= "maxCommitAge" > </str> </deletionPolicy> To facilitate replication, we can have a wrapper over the IndexDeletionPolicy which can provide us the features needed for replication ( SOLR-561 ). We need access to a list of non-deleted IndexCommit instances, a way to lookup IndexCommit given a version as well as the latest commit point.
          Hide
          Akshay K. Ukey added a comment -

          Attached is a patch with SolrDeletionPolicy class as the default deletion policy, with configuration as in Shalin's comment above, except that <str> is used for all the parameters in the named list. IndexDeletionPolicyWrapper class over IndexDeletionPolicy to enable access to commit details. Currently <maxCommitAge> named list parameter is not considered in the implementation of SolrDeletionPolicy.

          Show
          Akshay K. Ukey added a comment - Attached is a patch with SolrDeletionPolicy class as the default deletion policy, with configuration as in Shalin's comment above, except that <str> is used for all the parameters in the named list. IndexDeletionPolicyWrapper class over IndexDeletionPolicy to enable access to commit details. Currently <maxCommitAge> named list parameter is not considered in the implementation of SolrDeletionPolicy.
          Hide
          Shalin Shekhar Mangar added a comment -

          This is looking great. Thanks!

          We need a few tests for this. With the recent changes in Lucene trunk, we can get rid of the wrapper over IndexCommit. We need to support maxCommitAge too so that users who do a lot commits can reliably replicate without storing too many generations.

          Show
          Shalin Shekhar Mangar added a comment - This is looking great. Thanks! We need a few tests for this. With the recent changes in Lucene trunk, we can get rid of the wrapper over IndexCommit. We need to support maxCommitAge too so that users who do a lot commits can reliably replicate without storing too many generations.
          Hide
          Yonik Seeley added a comment -

          I think the deletion policy should be able to support true reservation... see the prototype patch I put together in https://issues.apache.org/jira/secure/attachment/12383728/deletion_policy.patch

          Lucene has added more capabilities since then, so we should be able to simplify it.

          Show
          Yonik Seeley added a comment - I think the deletion policy should be able to support true reservation... see the prototype patch I put together in https://issues.apache.org/jira/secure/attachment/12383728/deletion_policy.patch Lucene has added more capabilities since then, so we should be able to simplify it.
          Hide
          Shalin Shekhar Mangar added a comment -

          We have two options:

          1. Do not reserve a commit point – If it gets deleted due to a newer commit, let an in-flight replication fail so that the slave re-polls and gets a fresher commit point.
          2. Let an in-flight replication reserve a commit point – The slave would start another replication immediately after the previous one because the master now has a newer commit point that what it had just pulled.

          I'm more in favor of the first approach. Here, the onus of keeping a commit point for reliable replication will fall on the user supplying configuration according to his commit frequency (the maxAge condition will be handy).

          Also, wouldn't implementing a reserve method limit us to only the default SolrDeletionPolicy?

          Show
          Shalin Shekhar Mangar added a comment - We have two options: Do not reserve a commit point – If it gets deleted due to a newer commit, let an in-flight replication fail so that the slave re-polls and gets a fresher commit point. Let an in-flight replication reserve a commit point – The slave would start another replication immediately after the previous one because the master now has a newer commit point that what it had just pulled. I'm more in favor of the first approach. Here, the onus of keeping a commit point for reliable replication will fall on the user supplying configuration according to his commit frequency (the maxAge condition will be handy). Also, wouldn't implementing a reserve method limit us to only the default SolrDeletionPolicy?
          Hide
          Yonik Seeley added a comment -

          #1 and #2 can both lead to starvation. I think the default should finish grabbing an index even if a newer version is available.

          Here, the onus of keeping a commit point for reliable replication will fall on the user supplying configuration

          If we can make it such that it just works, regardless of network speed, index size, etc, then I think we should. A reservation mechanism would easily enable this.

          Show
          Yonik Seeley added a comment - #1 and #2 can both lead to starvation. I think the default should finish grabbing an index even if a newer version is available. Here, the onus of keeping a commit point for reliable replication will fall on the user supplying configuration If we can make it such that it just works, regardless of network speed, index size, etc, then I think we should. A reservation mechanism would easily enable this.
          Hide
          Noble Paul added a comment -

          The IndexDeletionPolicyWrapper should be able to support the reserve feature even if the user provided IndexDeletionPolicy does not do it because it is wrapping the IndexCommit object.
          the config can be as follows

            <deletionPolicy class="solr.SolrIndexDeletionPolicy" >
              //this value will be honoured by the wrapper itself irrespective of the underlying implementation
               <str name="reserve">10</str>
          </deletionPolicy>
          
          Show
          Noble Paul added a comment - The IndexDeletionPolicyWrapper should be able to support the reserve feature even if the user provided IndexDeletionPolicy does not do it because it is wrapping the IndexCommit object. the config can be as follows <deletionPolicy class= "solr.SolrIndexDeletionPolicy" > // this value will be honoured by the wrapper itself irrespective of the underlying implementation <str name= "reserve" >10</str> </deletionPolicy>
          Hide
          Akshay K. Ukey added a comment -

          Patch synced with trunk.

          Show
          Akshay K. Ukey added a comment - Patch synced with trunk.
          Hide
          Akshay K. Ukey added a comment - - edited

          This patch has:
          1. maxCommitAge configuration support
          2. Reservation mechanism is added in the ReplicationHandler configuration. Code for the same is in the latest patch of https://issues.apache.org/jira/browse/SOLR-561
          3. Test cases.

          Show
          Akshay K. Ukey added a comment - - edited This patch has: 1. maxCommitAge configuration support 2. Reservation mechanism is added in the ReplicationHandler configuration. Code for the same is in the latest patch of https://issues.apache.org/jira/browse/SOLR-561 3. Test cases.
          Hide
          Shalin Shekhar Mangar added a comment -

          The patch looks good. I think this covers all the features we wanted to have.

          A few minor nitpicks

          1. The latestCommit and maxCommitAgeMillis variables in SolrDeletionPolicy are assigned but never used.
          2. The additional logging in onInit and onCommit in SolrDeletionPolicy can be removed – the same message is logged in FINE and INFO both
          3. The defaults in solrconfig.xml should mimic the previous behavior i.e. keep only the last commit point
          4. Javadocs on IndexDeletionPolicyWrapper#setReserveDuration will be helpful
          5. IndexDeletionPolicyWrapper#getCommits should return the generic version of the Map
          6. IndexDeletionPolicyWrapper#getConfiguredDeletionPolicy can be called getWrappedDeletionPolicy or just getDeletionPolicy
          7. Slight mistake in the logging, the info message should be in the same block
            if(keepOptimizedOnly){
                      if(!commit.isOptimized())
                        commit.delete();
                      log.info("Marking unoptimized index "+getId(commit)+ " for deletion.");
                    }
            
          8. The getter methods in SolrDeletionPolicy should be named more appropriately e.g. isKeepOptimizedOnly, getMaxCommitAge, getMaxCommitsToKeep

          The tests look good. Thanks for all the work, Akshay

          Show
          Shalin Shekhar Mangar added a comment - The patch looks good. I think this covers all the features we wanted to have. A few minor nitpicks The latestCommit and maxCommitAgeMillis variables in SolrDeletionPolicy are assigned but never used. The additional logging in onInit and onCommit in SolrDeletionPolicy can be removed – the same message is logged in FINE and INFO both The defaults in solrconfig.xml should mimic the previous behavior i.e. keep only the last commit point Javadocs on IndexDeletionPolicyWrapper#setReserveDuration will be helpful IndexDeletionPolicyWrapper#getCommits should return the generic version of the Map IndexDeletionPolicyWrapper#getConfiguredDeletionPolicy can be called getWrappedDeletionPolicy or just getDeletionPolicy Slight mistake in the logging, the info message should be in the same block if (keepOptimizedOnly){ if (!commit.isOptimized()) commit.delete(); log.info( "Marking unoptimized index " +getId(commit)+ " for deletion." ); } The getter methods in SolrDeletionPolicy should be named more appropriately e.g. isKeepOptimizedOnly, getMaxCommitAge, getMaxCommitsToKeep The tests look good. Thanks for all the work, Akshay
          Hide
          Shalin Shekhar Mangar added a comment -

          Also, we need to use slf4j instead of the JDK Logger in the patch.

          Show
          Shalin Shekhar Mangar added a comment - Also, we need to use slf4j instead of the JDK Logger in the patch.
          Hide
          Akshay K. Ukey added a comment -

          Patch with changes suggested by Shalin and logging using slf4j.

          Show
          Akshay K. Ukey added a comment - Patch with changes suggested by Shalin and logging using slf4j.
          Hide
          Shalin Shekhar Mangar added a comment -

          Updated with more javadocs and comments in solrconfig.xml

          I think this is ready to go in. I'll commit this in a day or two if there are no objection.

          Show
          Shalin Shekhar Mangar added a comment - Updated with more javadocs and comments in solrconfig.xml I think this is ready to go in. I'll commit this in a day or two if there are no objection.
          Hide
          Shalin Shekhar Mangar added a comment -

          Committed revision 699975.

          Thanks Yonik, Noble and Akshay!

          Show
          Shalin Shekhar Mangar added a comment - Committed revision 699975. Thanks Yonik, Noble and Akshay!
          Hide
          Grant Ingersoll added a comment -

          Bulk close for Solr 1.4

          Show
          Grant Ingersoll added a comment - Bulk close for Solr 1.4

            People

            • Assignee:
              Shalin Shekhar Mangar
              Reporter:
              Noble Paul
            • Votes:
              2 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development