Details

      Description

      The output connection must support commitWithin (http://wiki.apache.org/solr/UpdateXmlMessages#Optional_attributes_for_.22add.22) in addition to sending a commit() at the end of a job.

      This allows for efficient handling of commits on the Solr side.

      The parameter should ideally be configurable per job. In that way you could say that for "Important job" commitWithin=10s while for "Big crawl job", commitWithin=600s.

      1. CONNECTORS-202-code.patch
        8 kB
        Karl Wright
      2. CONNECTORS-202.patch
        2 kB
        Jan Høydahl

        Issue Links

          Activity

          Hide
          Karl Wright added a comment -

          An explicit commit is transmitted at the end of every job, and when a job is aborted, but I agree that that is not ideal. Your suggestion seems like a reasonable thing to add - probably on a new output specification tab, "Solr Commits" or some such. Pretty straightforward to code as well. Would you be interested in submitting a patch?

          Show
          Karl Wright added a comment - An explicit commit is transmitted at the end of every job, and when a job is aborted, but I agree that that is not ideal. Your suggestion seems like a reasonable thing to add - probably on a new output specification tab, "Solr Commits" or some such. Pretty straightforward to code as well. Would you be interested in submitting a patch?
          Hide
          Jan Høydahl added a comment -

          I've created a SOLR patch to allow commitWtihin as a request parameter.

          I guess this means that on the MCF side we could simply set a Name/Value pair on the SolrOutputConnector or change from "/update/extract" to "/update/extract?commitWithin=10000".

          But probably for usability's sake it makes sense to state it as an explicit param on the "Commits" tab below "Commit at end of every job" checkbox.

          Show
          Jan Høydahl added a comment - I've created a SOLR patch to allow commitWtihin as a request parameter. I guess this means that on the MCF side we could simply set a Name/Value pair on the SolrOutputConnector or change from "/update/extract" to "/update/extract?commitWithin=10000". But probably for usability's sake it makes sense to state it as an explicit param on the "Commits" tab below "Commit at end of every job" checkbox.
          Hide
          Karl Wright added a comment -

          Yes, making it explicit is preferred. But I thought you wanted to be able to set this on a per-job basis?

          Show
          Karl Wright added a comment - Yes, making it explicit is preferred. But I thought you wanted to be able to set this on a per-job basis?
          Hide
          Jan Høydahl added a comment -

          Right, that would be the best. If the param is set per job it should override the default on the output connector. This could be a pet project for me to contribute something simple to MCF

          Show
          Jan Høydahl added a comment - Right, that would be the best. If the param is set per job it should override the default on the output connector. This could be a pet project for me to contribute something simple to MCF
          Hide
          Jan Høydahl added a comment -

          SOLR-2540 is now committed, meaning that MCF may start sendinding &commitWithin=N GET parameters to Solr. How to proceed?

          Show
          Jan Høydahl added a comment - SOLR-2540 is now committed, meaning that MCF may start sendinding &commitWithin=N GET parameters to Solr. How to proceed?
          Hide
          Karl Wright added a comment -

          Great news!
          My suggestion is to wait to release this as a formal MCF feature until the current released version of Solr supports it. Otherwise we risk confusing people, and there is the workaround of explicitly providing the parameter within the connector's generic parameter feature. So for now, I think updating the end-user documentation would be best, and then when the next rev of Solr is released we can add the explicit feature you suggest. Does this sound reasonable?

          Show
          Karl Wright added a comment - Great news! My suggestion is to wait to release this as a formal MCF feature until the current released version of Solr supports it. Otherwise we risk confusing people, and there is the workaround of explicitly providing the parameter within the connector's generic parameter feature. So for now, I think updating the end-user documentation would be best, and then when the next rev of Solr is released we can add the explicit feature you suggest. Does this sound reasonable?
          Hide
          Jan Høydahl added a comment -

          The feature will appear in Solr 3.4, so we will anyway need to document this version requirement somewhere in the MCF docs. Therefore it should not make a big difference whether it's added now or later?

          If the commitWithin param is sent to earlier versions of Solr, it will simply ignore it silently.

          Show
          Jan Høydahl added a comment - The feature will appear in Solr 3.4, so we will anyway need to document this version requirement somewhere in the MCF docs. Therefore it should not make a big difference whether it's added now or later? If the commitWithin param is sent to earlier versions of Solr, it will simply ignore it silently.
          Hide
          Karl Wright added a comment -

          I have no problem if you want to submit a patch for this feature; I can commit it just as soon as Solr 3.4 goes out the door. I just don't think I'd hold the 0.3-incubating ManifoldCF release on account of it, unless Solr 3.4 is due to be released very shortly (days).

          Show
          Karl Wright added a comment - I have no problem if you want to submit a patch for this feature; I can commit it just as soon as Solr 3.4 goes out the door. I just don't think I'd hold the 0.3-incubating ManifoldCF release on account of it, unless Solr 3.4 is due to be released very shortly (days).
          Hide
          Jan Høydahl added a comment -

          Solr 3.4 is being released as we speak, will hit the road in a few days.

          Show
          Jan Høydahl added a comment - Solr 3.4 is being released as we speak, will hit the road in a few days.
          Hide
          Karl Wright added a comment -

          MCF 0.3 RC1 is currently under review as well.
          I've triaged this ticket for the 0.4 release.

          Show
          Karl Wright added a comment - MCF 0.3 RC1 is currently under review as well. I've triaged this ticket for the 0.4 release.
          Hide
          Jan Høydahl added a comment -

          That's ok, people can set commitWithin in Solr OutputConnector as a workaround. I'm not familiar with MCF code, but will see if I get back to a patch attempt later.

          Show
          Jan Høydahl added a comment - That's ok, people can set commitWithin in Solr OutputConnector as a workaround. I'm not familiar with MCF code, but will see if I get back to a patch attempt later.
          Hide
          Karl Wright added a comment -

          I looked briefly at how you'd want to do this. The current way output connectors are designed requires either the following:

          • The commit-within parameter is part of configuration information, in which case it is per-connection, not per job. But in this case a change to the commit-within info will not cause any documents to be reindexed.
          • The commit-within parameter is part of output specification information, in which case it is per-job. However, any changes to the parameter will cause all documents associated with that job to be reindexed the next time the job is run.

          It is also the case that the Solr output connector already has a configuration tab where a commit-within parameter would logically fit, but if output specification were used, a new tab would probably need to be introduced.

          While it is possible to change the output connector API so that specification information is available directly at the time the request to add to the index is made, all this together argues that maybe we should consider the parameter to be configuration not specification information. It is, after all, "how" information and not "what". If a user needs both "urgent" and "lazy" commits, they can readily do this by creating two Solr connections. Doesn't seem like there would be too much of a downside to this approach. What do you think?

          Show
          Karl Wright added a comment - I looked briefly at how you'd want to do this. The current way output connectors are designed requires either the following: The commit-within parameter is part of configuration information, in which case it is per-connection, not per job. But in this case a change to the commit-within info will not cause any documents to be reindexed. The commit-within parameter is part of output specification information, in which case it is per-job. However, any changes to the parameter will cause all documents associated with that job to be reindexed the next time the job is run. It is also the case that the Solr output connector already has a configuration tab where a commit-within parameter would logically fit, but if output specification were used, a new tab would probably need to be introduced. While it is possible to change the output connector API so that specification information is available directly at the time the request to add to the index is made, all this together argues that maybe we should consider the parameter to be configuration not specification information. It is, after all, "how" information and not "what". If a user needs both "urgent" and "lazy" commits, they can readily do this by creating two Solr connections. Doesn't seem like there would be too much of a downside to this approach. What do you think?
          Hide
          Jan Høydahl added a comment -

          Given the way output spec works, I guess it's an acceptable way for now to do this per output connector and create multiple if needed. Then this boils down to a documentation issue. Could you try to fit in a few words about this as a best practice (from Solr3.4)?

          Show
          Jan Høydahl added a comment - Given the way output spec works, I guess it's an acceptable way for now to do this per output connector and create multiple if needed. Then this boils down to a documentation issue. Could you try to fit in a few words about this as a best practice (from Solr3.4)?
          Hide
          Karl Wright added a comment -

          I'd be happy to commit anything you want to say in the documentation. Just submit a patch. The site is under site/src/documentation/content/xdocs.

          If you're ok with the proposed solution, there's a chance I might be able to do it next week. Alternatively you can submit a patch for that as well, and then I can check it into trunk (and people can apply it to 0.3 if they want the functionality in more supported form in 0.3).

          Show
          Karl Wright added a comment - I'd be happy to commit anything you want to say in the documentation. Just submit a patch. The site is under site/src/documentation/content/xdocs. If you're ok with the proposed solution, there's a chance I might be able to do it next week. Alternatively you can submit a patch for that as well, and then I can check it into trunk (and people can apply it to 0.3 if they want the functionality in more supported form in 0.3).
          Hide
          Jan Høydahl added a comment -

          Proposed end user documentation update for the update parameters tab, including examples for commitWithin and update.chain

          Show
          Jan Høydahl added a comment - Proposed end user documentation update for the update parameters tab, including examples for commitWithin and update.chain
          Hide
          Karl Wright added a comment -

          Looks fine. I'll commit it and update the site this evening.

          Show
          Karl Wright added a comment - Looks fine. I'll commit it and update the site this evening.
          Hide
          Karl Wright added a comment -

          r1170174 for the documentation update.

          Show
          Karl Wright added a comment - r1170174 for the documentation update.
          Hide
          Karl Wright added a comment -

          Attached patch for review

          Show
          Karl Wright added a comment - Attached patch for review
          Hide
          Karl Wright added a comment -

          Hi Jan, please let me know if the attached code patch works for you.

          Show
          Karl Wright added a comment - Hi Jan, please let me know if the attached code patch works for you.
          Hide
          Jan Høydahl added a comment -

          I applied the patch, built MCF and created a Solr output connector. Looks nice. Have not tested actual indexing to Solr.

          Show
          Jan Høydahl added a comment - I applied the patch, built MCF and created a Solr output connector. Looks nice. Have not tested actual indexing to Solr.
          Hide
          Karl Wright added a comment -

          r1171158

          Show
          Karl Wright added a comment - r1171158

            People

            • Assignee:
              Karl Wright
              Reporter:
              Jan Høydahl
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development