Issue Details (XML | Word | Printable)

Key: SOLR-617
Type: New Feature New Feature
Status: Closed Closed
Resolution: Fixed
Priority: Minor Minor
Assignee: Shalin Shekhar Mangar
Reporter: Noble Paul
Votes: 2
Watchers: 3
Operations

If you were logged in you would be able to see more operations.
Solr

Allow configurable deletion policy

Created: 07/Jul/08 06:18 AM   Updated: 10/Nov/09 03:51 PM
Return to search
Component/s: search, update
Affects Version/s: 1.4
Fix Version/s: 1.4

Time Tracking:
Not Specified

File Attachments:
  Size
Text File Licensed for inclusion in ASF works 617.patch 2008-09-01 06:17 PM Akshay K. Ukey 19 kB
Text File Licensed for inclusion in ASF works solr-617.patch 2008-09-26 09:15 PM Shalin Shekhar Mangar 65 kB
Text File Licensed for inclusion in ASF works solr-617.patch 2008-09-26 07:13 PM Akshay K. Ukey 54 kB
Text File Licensed for inclusion in ASF works solr-617.patch 2008-09-23 07:54 AM Akshay K. Ukey 55 kB
Text File Licensed for inclusion in ASF works solr-617.patch 2008-09-08 01:38 PM Akshay K. Ukey 18 kB
Text File Licensed for inclusion in ASF works solr-617.patch 2008-08-25 12:50 PM Akshay K. Ukey 19 kB
Issue Links:
Dependants
 

Resolution Date: 29/Sep/08 03:38 AM


 Description  « Hide
Lucene API provides means to configure deletion policy. Solr should be able to expose it through configuration in solrconfig.xml. Moreover the new replication (SOLR-561) strategy is going to rely on this .

I propose the configuration go into the <mainIndex> section

sample configuration

solrconfig.xml
<mainIndex>
    <!-- configure deletion policy here-->
    <deletionPolicy>
       <!-- Store only the commits with optimize.Non optimized commits will get deleted by lucene when 
               the last IndexWriter/IndexReader using this commit point is closed  -->
        <keepOptimizedOnly>true</keepOptimizedOnly>
         <!--Maximum no: of commit points stored . Older ones will be cleaned when they go out of scope-->
        <maxCommitsToKeep></maxCommitsToKeep>
         <!-- max age of a stored commit-->
        <maxCommitAge></maxCommitAge>    
    </deletionPolicy>
    
  </mainIndex>


 All   Comments   Work Log   Change History   Subversion Commits      Sort Order: Ascending order - Click to sort in descending order
Akshay K. Ukey added a comment - 25/Aug/08 12:50 PM
This patch adds support for the configuration described in the issue description.

Shalin Shekhar Mangar added a comment - 28/Aug/08 02:11 PM - edited
Thanks for the patch Akshay!

I think we should allow a user to specify his custom IndexDeletionPolicy too. We can provide a default implementation with all the options specified in the issue description. So I propose that we have the following syntax:

<deletionPolicy class="com.MyDeletionPolicy" />

The default implementation will be SolrIndexDeletionPolicy which can be configured through a NamedList. Any custom deletion policy will be initialized with a NamedList if it implements NamedListInitializedPlugin.

<!-- configure deletion policy here-->
    <deletionPolicy class="solr.SolrIndexDeletionPolicy">
      <!--  Store only the commits with optimize.Non optimized commits will get deleted by lucene when
            the last IndexWriter/IndexReader using this commit point is closed  -->
      <str name="keepOptimizedOnly">true</str>
      <!--Maximum no: of commit points stored . Older ones will be cleaned when they go out of scope-->
      <str name="maxCommitsToKeep"></str>
      <!-- max age of a stored commit-->
      <str name="maxCommitAge"></str>
    </deletionPolicy>

To facilitate replication, we can have a wrapper over the IndexDeletionPolicy which can provide us the features needed for replication (SOLR-561). We need access to a list of non-deleted IndexCommit instances, a way to lookup IndexCommit given a version as well as the latest commit point.


Akshay K. Ukey added a comment - 01/Sep/08 06:17 PM
Attached is a patch with SolrDeletionPolicy class as the default deletion policy, with configuration as in Shalin's comment above, except that <str> is used for all the parameters in the named list. IndexDeletionPolicyWrapper class over IndexDeletionPolicy to enable access to commit details. Currently <maxCommitAge> named list parameter is not considered in the implementation of SolrDeletionPolicy.

Shalin Shekhar Mangar added a comment - 02/Sep/08 11:58 AM
This is looking great. Thanks!

We need a few tests for this. With the recent changes in Lucene trunk, we can get rid of the wrapper over IndexCommit. We need to support maxCommitAge too so that users who do a lot commits can reliably replicate without storing too many generations.


Yonik Seeley added a comment - 02/Sep/08 05:17 PM
I think the deletion policy should be able to support true reservation... see the prototype patch I put together in https://issues.apache.org/jira/secure/attachment/12383728/deletion_policy.patch

Lucene has added more capabilities since then, so we should be able to simplify it.


Shalin Shekhar Mangar added a comment - 02/Sep/08 06:12 PM
We have two options:
  1. Do not reserve a commit point – If it gets deleted due to a newer commit, let an in-flight replication fail so that the slave re-polls and gets a fresher commit point.
  2. Let an in-flight replication reserve a commit point – The slave would start another replication immediately after the previous one because the master now has a newer commit point that what it had just pulled.

I'm more in favor of the first approach. Here, the onus of keeping a commit point for reliable replication will fall on the user supplying configuration according to his commit frequency (the maxAge condition will be handy).

Also, wouldn't implementing a reserve method limit us to only the default SolrDeletionPolicy?


Yonik Seeley added a comment - 03/Sep/08 01:06 AM
#1 and #2 can both lead to starvation. I think the default should finish grabbing an index even if a newer version is available.

Here, the onus of keeping a commit point for reliable replication will fall on the user supplying configuration

If we can make it such that it just works, regardless of network speed, index size, etc, then I think we should. A reservation mechanism would easily enable this.


Noble Paul added a comment - 03/Sep/08 04:49 AM
The IndexDeletionPolicyWrapper should be able to support the reserve feature even if the user provided IndexDeletionPolicy does not do it because it is wrapping the IndexCommit object.
the config can be as follows
<deletionPolicy class="solr.SolrIndexDeletionPolicy" >
    //this value will be honoured by the wrapper itself irrespective of the underlying implementation
     <str name="reserve">10</str>
</deletionPolicy>

Akshay K. Ukey added a comment - 08/Sep/08 01:38 PM
Patch synced with trunk.

Akshay K. Ukey added a comment - 23/Sep/08 07:54 AM - edited
This patch has:
1. maxCommitAge configuration support
2. Reservation mechanism is added in the ReplicationHandler configuration. Code for the same is in the latest patch of https://issues.apache.org/jira/browse/SOLR-561
3. Test cases.

Shalin Shekhar Mangar added a comment - 24/Sep/08 07:03 PM
The patch looks good. I think this covers all the features we wanted to have.

A few minor nitpicks

  1. The latestCommit and maxCommitAgeMillis variables in SolrDeletionPolicy are assigned but never used.
  2. The additional logging in onInit and onCommit in SolrDeletionPolicy can be removed – the same message is logged in FINE and INFO both
  3. The defaults in solrconfig.xml should mimic the previous behavior i.e. keep only the last commit point
  4. Javadocs on IndexDeletionPolicyWrapper#setReserveDuration will be helpful
  5. IndexDeletionPolicyWrapper#getCommits should return the generic version of the Map
  6. IndexDeletionPolicyWrapper#getConfiguredDeletionPolicy can be called getWrappedDeletionPolicy or just getDeletionPolicy
  7. Slight mistake in the logging, the info message should be in the same block
    if(keepOptimizedOnly){
              if(!commit.isOptimized())
                commit.delete();
              log.info("Marking unoptimized index "+getId(commit)+ " for deletion.");
            }
  8. The getter methods in SolrDeletionPolicy should be named more appropriately e.g. isKeepOptimizedOnly, getMaxCommitAge, getMaxCommitsToKeep

The tests look good. Thanks for all the work, Akshay


Shalin Shekhar Mangar added a comment - 24/Sep/08 07:34 PM
Also, we need to use slf4j instead of the JDK Logger in the patch.

Akshay K. Ukey added a comment - 26/Sep/08 07:13 PM
Patch with changes suggested by Shalin and logging using slf4j.

Shalin Shekhar Mangar added a comment - 26/Sep/08 09:15 PM
Updated with more javadocs and comments in solrconfig.xml

I think this is ready to go in. I'll commit this in a day or two if there are no objection.


Shalin Shekhar Mangar added a comment - 29/Sep/08 03:38 AM
Committed revision 699975.

Thanks Yonik, Noble and Akshay!


Grant Ingersoll added a comment - 10/Nov/09 03:51 PM
Bulk close for Solr 1.4