Solr
  1. Solr
  2. SOLR-3033

"numberToKeep" on replication handler does not work with "backupAfter"

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 3.5
    • Fix Version/s: 3.6
    • Component/s: replication (java)
    • Labels:
      None
    • Environment:

      openjdk 1.6, linux 3.x

      Description

      Configured my replication handler like this:

      <requestHandler name="/replication" class="solr.ReplicationHandler" >
      <lst name="master">
      <str name="replicateAfter">startup</str>
      <str name="replicateAfter">commit</str>
      <str name="replicateAfter">optimize</str>
      <str name="confFiles">elevate.xml,schema.xml,spellings.txt,stopwords.txt,stopwords_de.txt,stopwords_en.txt,synonyms_de.txt,synonyms.txt</str>
      <str name="backupAfter">optimize</str>
      <str name="numberToKeep">1</str>
      </lst>
      </requestHandler>

      So after optimize a snapshot should be taken, this works. But numberToKeep is ignored, snapshots are increasing with each call to optimize and are kept forever. Seems this settings have no effect.

      1. SOLR-3033-failingtest.patch
        13 kB
        Tomás Fernández Löbbe
      2. SOLR-3033.patch
        8 kB
        James Dyer
      3. SOLR-3033.patch
        9 kB
        James Dyer

        Activity

        Hide
        James Dyer added a comment -

        Torsten,

        I think for this to work you need to put the "numberToKeep" parameter in the request URL, not in the configuration file. I realize this is counterintuitive because most request handlers let you specify parameters either way. I think the reason it doesn't work in the config file might be because you've got to nest parameters within the <lst name="master" /> element. So try putting this in the url and see if that works for you.

        In any case, I'm thinking the only thing to do for this is perhaps clarify this point in the wiki. Even if we could fix this, it wouldn't be appropriate to put this parameter is every replication request as typically you'd use the same handler for both replication and backups, and this one applies to backups only. Anyone have any thoughts about this?

        Show
        James Dyer added a comment - Torsten, I think for this to work you need to put the "numberToKeep" parameter in the request URL, not in the configuration file. I realize this is counterintuitive because most request handlers let you specify parameters either way. I think the reason it doesn't work in the config file might be because you've got to nest parameters within the <lst name="master" /> element. So try putting this in the url and see if that works for you. In any case, I'm thinking the only thing to do for this is perhaps clarify this point in the wiki. Even if we could fix this, it wouldn't be appropriate to put this parameter is every replication request as typically you'd use the same handler for both replication and backups, and this one applies to backups only. Anyone have any thoughts about this?
        Hide
        Torsten Krah added a comment -

        I do not call an url. Just doing an optimize call.
        The option "backupAfter" would make no sense if e.g. set to optimize, if replication handler can not be configured for "backup" usage to tell how much numbers to keep.

        Why its not possible to propagate numberToKeep to the backup function call? backupAfter is propagated too and numberToKeep would match to this scenario, wouldn't it?
        If i have to make an explicit call for a snapshot - ehh how would such a request look? - the option backupAfter optimize is suboptimal as diskspace may be running out until the url request is coming in which does delete the old ones, isn't it?

        Show
        Torsten Krah added a comment - I do not call an url. Just doing an optimize call. The option "backupAfter" would make no sense if e.g. set to optimize, if replication handler can not be configured for "backup" usage to tell how much numbers to keep. Why its not possible to propagate numberToKeep to the backup function call? backupAfter is propagated too and numberToKeep would match to this scenario, wouldn't it? If i have to make an explicit call for a snapshot - ehh how would such a request look? - the option backupAfter optimize is suboptimal as diskspace may be running out until the url request is coming in which does delete the old ones, isn't it?
        Hide
        James Dyer added a comment -

        I see what you're saying. I wasn't thinking about using the "backupAfter" parameter at all. It seems reasonable then, because of "backupAfter", we would have to support having "numberToKeep" in the configuration.

        As a workaround, you can execute backups by calling the command url. The syntax is:

        http://master_host:port/solr/replication?command=backup&numberToKeep=1
        

        See http://wiki.apache.org/solr/SolrReplication#HTTP_API for more information.

        Show
        James Dyer added a comment - I see what you're saying. I wasn't thinking about using the "backupAfter" parameter at all. It seems reasonable then, because of "backupAfter", we would have to support having "numberToKeep" in the configuration. As a workaround, you can execute backups by calling the command url. The syntax is: http://master_host:port/solr/replication?command=backup&numberToKeep=1 See http://wiki.apache.org/solr/SolrReplication#HTTP_API for more information.
        Hide
        James Dyer added a comment -

        Here's a patch that allows "numberToKeep" in the config file.

        Show
        James Dyer added a comment - Here's a patch that allows "numberToKeep" in the config file.
        Hide
        Hoss Man added a comment -

        James: two things to think about.

        1) when adding new test configs, try to keep them as minimal as possible, so the only things in them are things that have to be there for the purposes of the test.

        2) there are really two types of "params" when dealing with request handlers – init params (ie: things in the body of the requestHandler tag in solrconfig.xml) and request params (things passed to the handler when it is executed. via RequestHandlerBase many request handlers support the idea of init params named "defaults", "invariants" and "appends" which can contain sub-params that are consulted when parsing/processing request params in handleRequest.

        In the case of the "numberToKeep", this is already a request param, and ReplicationHandler already subclasses RequestHandlerBase which means people can define a "defaults" section in their ReplicationHandler config so any requests to "http://master_host:port/solr/replication?command=backup" get that value automaticly. but your patch seems to add support for an init param with the same name: which raises questions like "what happens if i specify differnet values for numberToKeep in init params and in invariant params?"

        it seems like the crux of the problem is that if you use the "backupAfter" option, the code path to create the backup bypasses a lot of the logic that is normally used when a backup command is processed via handleRequest. So instead of adding an init param version of numberToKeep, perhaps it owuld be better if the "backupAfter" codepath followed the same code path as handleRequest as much as possible? perhaps it could be something as straight forward as changing the meat of getEventListener to look like...

        SolrQueryRequest req = new LocalSolrRequest(core ...);
        try {
          RequestHandler.this.handleRequest(req, new SolrQueryResponse());
        } finally {
          req.close();
        }
        

        what do you think?

        Show
        Hoss Man added a comment - James: two things to think about. 1) when adding new test configs, try to keep them as minimal as possible, so the only things in them are things that have to be there for the purposes of the test. 2) there are really two types of "params" when dealing with request handlers – init params (ie: things in the body of the requestHandler tag in solrconfig.xml) and request params (things passed to the handler when it is executed. via RequestHandlerBase many request handlers support the idea of init params named "defaults", "invariants" and "appends" which can contain sub-params that are consulted when parsing/processing request params in handleRequest. In the case of the "numberToKeep", this is already a request param, and ReplicationHandler already subclasses RequestHandlerBase which means people can define a "defaults" section in their ReplicationHandler config so any requests to "http://master_host:port/solr/replication?command=backup" get that value automaticly. but your patch seems to add support for an init param with the same name: which raises questions like "what happens if i specify differnet values for numberToKeep in init params and in invariant params?" it seems like the crux of the problem is that if you use the "backupAfter" option, the code path to create the backup bypasses a lot of the logic that is normally used when a backup command is processed via handleRequest. So instead of adding an init param version of numberToKeep, perhaps it owuld be better if the "backupAfter" codepath followed the same code path as handleRequest as much as possible? perhaps it could be something as straight forward as changing the meat of getEventListener to look like... SolrQueryRequest req = new LocalSolrRequest(core ...); try { RequestHandler. this .handleRequest(req, new SolrQueryResponse()); } finally { req.close(); } what do you think?
        Hide
        Hoss Man added a comment -

        updating summary to clarify scope of problem

        Show
        Hoss Man added a comment - updating summary to clarify scope of problem
        Hide
        James Dyer added a comment -

        So instead of adding an init param version of numberToKeep, perhaps it owuld be better if the "backupAfter" codepath followed the same code path as handleRequest as much as possible?

        It wasn't so much my intention to add an init-param but to make a way to give a default value for this in solrconfig.xml as you can for other handlers. Without a way to declare a default in solrconfig.xml, the user has no way to use this parameter should a backup be triggered by "backupAfter".

        When I looked at this, it didn't seem that ReplicationHandler follows the normal rules. We don't have a <defaults /> section for request parameters, do we? And looking at the available request parameters, we probably wouldn't want defaults for any of them (see http://wiki.apache.org/solr/SolrReplication#HTTP_API).

        This makes me wonder if my first try was a mistake. Possibly this should only be an init-param. This would let users configure how many to keep on the Master, and how many to keep on the Slave. We don't let users change poll intervals with request params, so why let them change the archive policy with request params?

        If we kept it as a request-param only, but then let the user specify defaults, would that create a legal <defaults /> and <invariants /> section nested within <master /> and <slave />, so users can specify defaults for each?

        I don't have a strong feeling on this and would change the patch to work any way you thought best. Somehow it seems that "numberToKeep" needs to have a default setting somewhere, somehow, so it will work with "backupAfter".

        Show
        James Dyer added a comment - So instead of adding an init param version of numberToKeep, perhaps it owuld be better if the "backupAfter" codepath followed the same code path as handleRequest as much as possible? It wasn't so much my intention to add an init-param but to make a way to give a default value for this in solrconfig.xml as you can for other handlers. Without a way to declare a default in solrconfig.xml, the user has no way to use this parameter should a backup be triggered by "backupAfter". When I looked at this, it didn't seem that ReplicationHandler follows the normal rules. We don't have a <defaults /> section for request parameters, do we? And looking at the available request parameters, we probably wouldn't want defaults for any of them (see http://wiki.apache.org/solr/SolrReplication#HTTP_API ). This makes me wonder if my first try was a mistake. Possibly this should only be an init-param. This would let users configure how many to keep on the Master, and how many to keep on the Slave. We don't let users change poll intervals with request params, so why let them change the archive policy with request params? If we kept it as a request-param only, but then let the user specify defaults, would that create a legal <defaults /> and <invariants /> section nested within <master /> and <slave />, so users can specify defaults for each? I don't have a strong feeling on this and would change the patch to work any way you thought best. Somehow it seems that "numberToKeep" needs to have a default setting somewhere, somehow, so it will work with "backupAfter".
        Hide
        Hoss Man added a comment -

        Without a way to declare a default in solrconfig.xml, the user has no way to use this parameter should a backup be triggered by "backupAfter".

        right – my point is we already have a convention for specifying "default params for a request handler" but your patch doesn't use that convention.

        We don't have a <defaults /> section for request parameters, do we?

        Any request handler that subclasses RequestHandlerBase automatically gets defaults applied when handleRequest is called if they are specified in the configs (the syntax isn't "<defaults />" it's "<lst name="defaults">...</lst><lst name="appends">...</lst><lst name="invariants">...</lst>)

        If we kept it as a request-param only, but then let the user specify defaults, would that create a legal <defaults /> and <invariants /> section nested within <master /> and <slave />, so users can specify defaults for each?

        I'm not sure that would really make sense .. what if an instances was acting as a repeater so it's both a master and a slave? if you told it to create a backup, how many would it keep if there was a differnet value specified in the master/slave sections?

        I think maybe you've hit the nail on the head here...

        And looking at the available request parameters, we probably wouldn't want defaults for any of them
        ...
        This makes me wonder if my first try was a mistake. Possibly this should only be an init-param.

        So perhaps the way forward is...

        • keep the "numberToKeep" request param around for backcompat with Solr 3.5 for people who want to manually specify it when triggering command=backups
        • add a new init param for ReplicationHandler to specify how many backups to keep when backups are made – the name for this new param should probably not be numberToKeep (suggestion: "maxNumberOfBackups") because:
          • we need a name that clarifies it's specific to backups
          • we want a name that is distinct from the request param so in docs it's clear which one is being refered to
        • document clearly the interaction between the maxNumberOfBackups init param and the numberToKeep request param (suggestion: "the numberToKeep request param can be used with the backup command unless the maxNumberOfBackups init param has been specified on the handler – in which case maxNumberOfBackups is always used and attempts to use the numberToKeep request param will cause an error"

        what do you think?

        Show
        Hoss Man added a comment - Without a way to declare a default in solrconfig.xml, the user has no way to use this parameter should a backup be triggered by "backupAfter". right – my point is we already have a convention for specifying "default params for a request handler" but your patch doesn't use that convention. We don't have a <defaults /> section for request parameters, do we? Any request handler that subclasses RequestHandlerBase automatically gets defaults applied when handleRequest is called if they are specified in the configs (the syntax isn't "<defaults />" it's "<lst name="defaults">...</lst><lst name="appends">...</lst><lst name="invariants">...</lst>) If we kept it as a request-param only, but then let the user specify defaults, would that create a legal <defaults /> and <invariants /> section nested within <master /> and <slave />, so users can specify defaults for each? I'm not sure that would really make sense .. what if an instances was acting as a repeater so it's both a master and a slave? if you told it to create a backup, how many would it keep if there was a differnet value specified in the master/slave sections? I think maybe you've hit the nail on the head here... And looking at the available request parameters, we probably wouldn't want defaults for any of them ... This makes me wonder if my first try was a mistake. Possibly this should only be an init-param. So perhaps the way forward is... keep the "numberToKeep" request param around for backcompat with Solr 3.5 for people who want to manually specify it when triggering command=backups add a new init param for ReplicationHandler to specify how many backups to keep when backups are made – the name for this new param should probably not be numberToKeep (suggestion: "maxNumberOfBackups") because: we need a name that clarifies it's specific to backups we want a name that is distinct from the request param so in docs it's clear which one is being refered to document clearly the interaction between the maxNumberOfBackups init param and the numberToKeep request param (suggestion: "the numberToKeep request param can be used with the backup command unless the maxNumberOfBackups init param has been specified on the handler – in which case maxNumberOfBackups is always used and attempts to use the numberToKeep request param will cause an error" what do you think?
        Hide
        James Dyer added a comment -

        I agree we should have it as a request param for backwards compatibility, and allow it as a better-named initParam. Clear documentation in the wiki would be in order. Two things though:

        1. Maybe we should have the request param override the init param rather than generate an error. This is consistent with how handler params work in general. As it was lost on me, many people won't appreciate the subtle difference between an init-param and a request-param in this case and will just want it to behave like any other handler. (moot point here if we are just removing the init-param from 4.x and keeping it, deprecated, in 3.x)

        2. Are you saying that we should require this param to be outside <slave/> and <master/>, thus avoiding the conflict if a node is a repeater? We could allow it inside <slave/> and <master/> and document that in the case of a repeater the value in <slave/> takes precedence over the value in <master/>. This is more confusing for the repeater case, but simpler in that it seems every other init parameter gets specified separately for slaves and master.

        Once again I don't have a strong preference here but the thoughts occurred...

        Show
        James Dyer added a comment - I agree we should have it as a request param for backwards compatibility, and allow it as a better-named initParam. Clear documentation in the wiki would be in order. Two things though: 1. Maybe we should have the request param override the init param rather than generate an error. This is consistent with how handler params work in general. As it was lost on me, many people won't appreciate the subtle difference between an init-param and a request-param in this case and will just want it to behave like any other handler. (moot point here if we are just removing the init-param from 4.x and keeping it, deprecated, in 3.x) 2. Are you saying that we should require this param to be outside <slave/> and <master/>, thus avoiding the conflict if a node is a repeater? We could allow it inside <slave/> and <master/> and document that in the case of a repeater the value in <slave/> takes precedence over the value in <master/>. This is more confusing for the repeater case, but simpler in that it seems every other init parameter gets specified separately for slaves and master. Once again I don't have a strong preference here but the thoughts occurred...
        Hide
        James Dyer added a comment -

        New patch incorporates Hoss's suggestions from 1-12-2012.

        • "numberToKeep" stays as a request param
        • "maxNumberOfBackups" is introduced as a init param. This is top-level and cannot be specified separately for masters & slaves.
        • Trying to use both params results in a "BAD_REQUEST" (400) SolrException.

        I will commit this shortly to Trunk & 3.x along with Wiki changes.

        Show
        James Dyer added a comment - New patch incorporates Hoss's suggestions from 1-12-2012. "numberToKeep" stays as a request param "maxNumberOfBackups" is introduced as a init param. This is top-level and cannot be specified separately for masters & slaves. Trying to use both params results in a "BAD_REQUEST" (400) SolrException. I will commit this shortly to Trunk & 3.x along with Wiki changes.
        Hide
        James Dyer added a comment -
        • Committed r1294702 (trunk) & r1294718 (3.x).
        • Updated the Wiki with explanation of the new parameter

        Thanks for reporting this Torsten.

        Show
        James Dyer added a comment - Committed r1294702 (trunk) & r1294718 (3.x). Updated the Wiki with explanation of the new parameter Thanks for reporting this Torsten.
        Hide
        Tomás Fernández Löbbe added a comment - - edited

        This fix is not working for me with this configuration:

          <requestHandler name="/replication" class="solr.ReplicationHandler">
            <lst name="master">
            	<str name="replicateAfter">commit</str>
                <str name="backupAfter">commit</str>
            </lst>
            <str name="maxNumberOfBackups">3</str>
          </requestHandler>
        

        The backups are created as expected, but snapshots are never deleted. They do if I issue a request with "numberToKeep", but not automatically after a commit.
        I'm attaching a test case.

        My tests and the patch are for trunk.

        Show
        Tomás Fernández Löbbe added a comment - - edited This fix is not working for me with this configuration: <requestHandler name="/replication" class="solr.ReplicationHandler"> <lst name="master"> <str name="replicateAfter">commit</str> <str name="backupAfter">commit</str> </lst> <str name="maxNumberOfBackups">3</str> </requestHandler> The backups are created as expected, but snapshots are never deleted. They do if I issue a request with "numberToKeep", but not automatically after a commit. I'm attaching a test case. My tests and the patch are for trunk.
        Hide
        Tomás Fernández Löbbe added a comment -

        James, did you get a chance to review my patch? am I missing something?

        Show
        Tomás Fernández Löbbe added a comment - James, did you get a chance to review my patch? am I missing something?
        Hide
        James Dyer added a comment -

        Tomas,

        You're correct this is a bug. I opened SOLR-3361 for this and attached a fix. Thank you for reporting this.

        Show
        James Dyer added a comment - Tomas, You're correct this is a bug. I opened SOLR-3361 for this and attached a fix. Thank you for reporting this.

          People

          • Assignee:
            James Dyer
            Reporter:
            Torsten Krah
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development