Solr
  1. Solr
  2. SOLR-1709

Distributed Date and Range Faceting

    Details

      Description

      This patch is for adding support for date facets when using distributed searches.

      Date faceting across multiple machines exposes some time-based issues that anyone interested in this behaviour should be aware of:
      Any time and/or time-zone differences are not accounted for in the patch (i.e. merged date facets are at a time-of-day, not necessarily at a universal 'instant-in-time', unless all shards are time-synced to the exact same time).
      The implementation uses the first encountered shard's facet_dates as the basis for subsequent shards' data to be merged in.
      This means that if subsequent shards' facet_dates are skewed in relation to the first by >1 'gap', these 'earlier' or 'later' facets will not be merged in.
      There are several reasons for this:

      • Performance: It's faster to check facet_date lists against a single map's data, rather than against each other, particularly if there are many shards
      • If 'earlier' and/or 'later' facet_dates are added in, this will make the time range larger than that which was requested
        (e.g. a request for one hour's worth of facets could bring back 2, 3 or more hours of data)
        This could be dealt with if timezone and skew information was added, and the dates were normalized.
        One possibility for adding such support is to [optionally] add 'timezone' and 'now' parameters to the 'facet_dates' map. This would tell requesters what time and TZ the remote server thinks it is, and so multiple shards' time data can be normalized.

      The patch affects 2 files in the Solr core:
      org.apache.solr.handler.component.FacetComponent.java
      org.apache.solr.handler.component.ResponseBuilder.java

      The main changes are in FacetComponent - ResponseBuilder is just to hold the completed SimpleOrderedMap until the finishStage.
      One possible enhancement is to perhaps make this an optional parameter, but really, if facet.date parameters are specified, it is assumed they are desired.
      Comments & suggestions welcome.

      As a favour to ask, if anyone could take my 2 source files and create a PATCH file from it, it would be greatly appreciated, as I'm having a bit of trouble with svn (don't shoot me, but my environment is a Redmond-based os company).

      1. SOLR-1709_3x.patch
        15 kB
        Simon Willnauer
      2. SOLR-1709.patch
        11 kB
        Hoss Man
      3. SOLR-1709_distributed_date_faceting_v3x.patch
        8 kB
        David Smiley
      4. solr-1.4.0-solr-1709.patch
        5 kB
        Thomas Hammerl
      5. FacetComponent.java
        27 kB
        Peter Sturge
      6. ResponseBuilder.java
        8 kB
        Peter Sturge
      7. FacetComponent.java
        27 kB
        Peter Sturge

        Issue Links

          Activity

          Hide
          Alejandro Marqués added a comment -

          Should the facet.mincount parameter work with distributed date faceting?

          I have two Solr servers (http://localhost:8090/solr1/ and http://localhost:8090/solr2/) each one with part of the docs from exampledocs

          If I make the next query:
          http://localhost:8090/solr1/select/?q=*%3A*&version=2.2&start=0&rows=10&indent=on&facet=true&facet.mincount=1&facet.date=manufacturedate_dt&facet.date.start=2005-05-01T00:00:00Z&facet.date.end=2006-05-00T00:00:00Z&facet.date.gap=%2B1MONTH

          The mincount parameter works properly, however, if I add the shards parameter (shards=localhost:8090/solr1,localhost:8090/solr2 or even shards=localhost:8090/solr1):
          http://localhost:8090/solr1/select/?q=*%3A*&version=2.2&start=0&rows=10&indent=on&shards=localhost:8090/solr1,localhost:8090/solr2&facet=true&facet.mincount=1&facet.date=manufacturedate_dt&facet.date.start=2005-05-01T00:00:00Z&facet.date.end=2006-05-00T00:00:00Z&facet.date.gap=%2B1MONTH

          mincount parameter is being ignored retrieving facets with 0 results.

          Should the mincount parameter work as in the single search or it is not supported for date facets?

          Thanks in advance

          Show
          Alejandro Marqués added a comment - Should the facet.mincount parameter work with distributed date faceting? I have two Solr servers ( http://localhost:8090/solr1/ and http://localhost:8090/solr2/ ) each one with part of the docs from exampledocs If I make the next query: http://localhost:8090/solr1/select/?q=*%3A*&version=2.2&start=0&rows=10&indent=on&facet=true&facet.mincount=1&facet.date=manufacturedate_dt&facet.date.start=2005-05-01T00:00:00Z&facet.date.end=2006-05-00T00:00:00Z&facet.date.gap=%2B1MONTH The mincount parameter works properly, however, if I add the shards parameter (shards=localhost:8090/solr1,localhost:8090/solr2 or even shards=localhost:8090/solr1): http://localhost:8090/solr1/select/?q=*%3A*&version=2.2&start=0&rows=10&indent=on&shards=localhost:8090/solr1,localhost:8090/solr2&facet=true&facet.mincount=1&facet.date=manufacturedate_dt&facet.date.start=2005-05-01T00:00:00Z&facet.date.end=2006-05-00T00:00:00Z&facet.date.gap=%2B1MONTH mincount parameter is being ignored retrieving facets with 0 results. Should the mincount parameter work as in the single search or it is not supported for date facets? Thanks in advance
          Hide
          Simon Willnauer added a comment -

          backported to 3x in revision 1232359

          Show
          Simon Willnauer added a comment - backported to 3x in revision 1232359
          Hide
          Simon Willnauer added a comment -

          reopen for backport to 3x

          Show
          Simon Willnauer added a comment - reopen for backport to 3x
          Hide
          Simon Willnauer added a comment -

          here is a patch created from my merge. I checked other commits to FacetComponent to make sure I got all the subsequent changes that are relevant.

          I think this is ready, I will commit later today

          Show
          Simon Willnauer added a comment - here is a patch created from my merge. I checked other commits to FacetComponent to make sure I got all the subsequent changes that are relevant. I think this is ready, I will commit later today
          Hide
          Peter Sturge added a comment -

          The original patch was actually for 3.x, so it should be relatively easy to port it
          I suppose there's likely to be some code changes 'post 3.x', so these would be the only thing to bear in mind when [re] back porting.

          Many thanks!
          Peter

          Show
          Peter Sturge added a comment - The original patch was actually for 3.x, so it should be relatively easy to port it I suppose there's likely to be some code changes 'post 3.x', so these would be the only thing to bear in mind when [re] back porting. Many thanks! Peter
          Hide
          Simon Willnauer added a comment -

          I am planning to backport this to 3.x any objections - I will look into SOLR-1729 too

          Show
          Simon Willnauer added a comment - I am planning to backport this to 3.x any objections - I will look into SOLR-1729 too
          Hide
          Erick Erickson added a comment -

          Right, that's true. I think there's a script for changing patches to reflect the new packaging, see: https://issues.apache.org/jira/browse/SOLR-2452. It should be located in <solr_home>/dev-tools/scripts. But I confess I haven't used it so you probably want to look at it first.

          Erick

          Show
          Erick Erickson added a comment - Right, that's true. I think there's a script for changing patches to reflect the new packaging, see: https://issues.apache.org/jira/browse/SOLR-2452 . It should be located in <solr_home>/dev-tools/scripts. But I confess I haven't used it so you probably want to look at it first. Erick
          Hide
          Rohit Gupta added a comment -

          Hi Erick,

          I am not sure, but I think the package structure has changes in 3.4, that what is causing the problem.

          Show
          Rohit Gupta added a comment - Hi Erick, I am not sure, but I think the package structure has changes in 3.4, that what is causing the problem.
          Hide
          Erick Erickson added a comment -

          Rohit:

          What didn't work when you just tried to apply the patch to 3.4?

          Show
          Erick Erickson added a comment - Rohit: What didn't work when you just tried to apply the patch to 3.4?
          Hide
          Rohit Gupta added a comment -

          I guess i was working on the wrong file(under tags), checked out branch with the revision specified in the branch field and works fine.

          But I am migrating with 3.4 as suggested on the solr site, how can I get a patch for that version.

          Show
          Rohit Gupta added a comment - I guess i was working on the wrong file(under tags), checked out branch with the revision specified in the branch field and works fine. But I am migrating with 3.4 as suggested on the solr site, how can I get a patch for that version.
          Hide
          Rohit Gupta added a comment -

          I am trying to apply the patch to solr 3.1 and solr 3.4 but in both case its not working. While applying the patch on 3.1 I am getting the error that the patch is outdated.

          While in 3.4 the folder structure is different.

          Show
          Rohit Gupta added a comment - I am trying to apply the patch to solr 3.1 and solr 3.4 but in both case its not working. While applying the patch on 3.1 I am getting the error that the patch is outdated. While in 3.4 the folder structure is different.
          Hide
          David Smiley added a comment -

          IMO this can go on branch 3x. Users don't have to use "NOW" in date faceting, they can use an explicit date assuming they are even doing date faceting (vs numeric range faceting). And even if "NOW" is used, the bug condition is relatively minor (no crash or exception, just some slightly off numbers) and furthermore limited to a small time window assuming you round it like "NOW/DAY".

          Show
          David Smiley added a comment - IMO this can go on branch 3x. Users don't have to use "NOW" in date faceting, they can use an explicit date assuming they are even doing date faceting (vs numeric range faceting). And even if "NOW" is used, the bug condition is relatively minor (no crash or exception, just some slightly off numbers) and furthermore limited to a small time window assuming you round it like "NOW/DAY".
          Hide
          Peter Sturge added a comment -

          Yes, the deprecation story makes sense.

          Regarding SOLR-1729, I'm pretty sure this already works for 3x (it was originally created on/for the 3x branch). I guess Yonik's NOW changes were destined for trunk, but I've been using the current SOLR-1729 patch on 3x branch and is working fine in production environments.

          Thanks
          Peter

          Show
          Peter Sturge added a comment - Yes, the deprecation story makes sense. Regarding SOLR-1729 , I'm pretty sure this already works for 3x (it was originally created on/for the 3x branch). I guess Yonik's NOW changes were destined for trunk, but I've been using the current SOLR-1729 patch on 3x branch and is working fine in production environments. Thanks Peter
          Hide
          Hoss Man added a comment -

          Deprecating data faceting in favour of generic range faceting should be fine...

          that ship has sailed, it was already deprecated in 3.1. facet.range supports everything facet.date supported, with nearly identical query params, and a slightly different response structure (it's now easier to parse because the counts are distinct from the meta values)

          my point was just that since facet.date has alreayd been deprecated, it made no sense to add distributed support for it unlesswe also added distributed support for facet.rang – but since you already did the work, there's no harm in having both.

          I've committed this to trunk for 4x. as i mentioned before, i'm leary of backporting to 3x unless SOLR-1729 is also backported. so i'm resolving for now and we can reopen if/when that happens

          Thanks to Peter and David for their perseverance and prodding (and tests!) to help get this committed.

          Committed revision 1095517.

          Show
          Hoss Man added a comment - Deprecating data faceting in favour of generic range faceting should be fine... that ship has sailed, it was already deprecated in 3.1. facet.range supports everything facet.date supported, with nearly identical query params, and a slightly different response structure (it's now easier to parse because the counts are distinct from the meta values) my point was just that since facet.date has alreayd been deprecated, it made no sense to add distributed support for it unlesswe also added distributed support for facet.rang – but since you already did the work, there's no harm in having both. I've committed this to trunk for 4x. as i mentioned before, i'm leary of backporting to 3x unless SOLR-1729 is also backported. so i'm resolving for now and we can reopen if/when that happens Thanks to Peter and David for their perseverance and prodding (and tests!) to help get this committed. Committed revision 1095517.
          Hide
          Peter Sturge added a comment -

          Updating ResponseBuilder rather than FacetInfo really came from tracing the references through the hierarchy - so, I don't think anything is missed by moving this to FacetInfo props, and should provide better encapsulation.
          Deprecating data faceting in favour of generic range faceting should be fine, as long as there exists a clear path to easily move from 'the way we were' with date facets, to 'the way it will be' (range faceting). It would be a shame to break clients that rely on the existing date facet parameters/syntax, so I guess if they're mapped to range (I think some of this is in 3.x already?), that would be good.

          Thanks

          Show
          Peter Sturge added a comment - Updating ResponseBuilder rather than FacetInfo really came from tracing the references through the hierarchy - so, I don't think anything is missed by moving this to FacetInfo props, and should provide better encapsulation. Deprecating data faceting in favour of generic range faceting should be fine, as long as there exists a clear path to easily move from 'the way we were' with date facets, to 'the way it will be' (range faceting). It would be a shame to break clients that rely on the existing date facet parameters/syntax, so I guess if they're mapped to range (I think some of this is in 3.x already?), that would be good. Thanks
          Hide
          Hoss Man added a comment -

          I just realized SOLR-1729 is only on trunk, and backporting it to 3x may be kind of invasive.

          so i'm removing 3.2 from the fix version for now .. we can revisit it, but i'm concerned about including this patch as it stands w/o the universal "NOW" concept.

          obviously that patch can still be used on 3x w/o the distributed NOW support (particularly now that it works with facet.range) but we might want to make it more assertive about dealing with shards that return inconsistent ranges

          Show
          Hoss Man added a comment - I just realized SOLR-1729 is only on trunk, and backporting it to 3x may be kind of invasive. so i'm removing 3.2 from the fix version for now .. we can revisit it, but i'm concerned about including this patch as it stands w/o the universal "NOW" concept. obviously that patch can still be used on 3x w/o the distributed NOW support (particularly now that it works with facet.range) but we might want to make it more assertive about dealing with shards that return inconsistent ranges
          Hide
          Hoss Man added a comment -

          I finally got a chacne to really look at this today. The deprecation of date faceting is obviously a big factor here – but thanks to the work peter and david already did, i was able to figure out how to enhance the code and tests to also work for the newer generic range faceting (believe it or not, i've never looked at the internals of any of the distributed components until now)

          Overview of changes in this patch...

          • applies to trunk (but i expect backporting this to 3x should be straight forward)
          • eliminate some warnings by using stronger typing on the generics
          • rename some variables for simpler readability
          • eliminate the new public methods in ResponseBuilder, use new properties in FacetInfo instead (this seems more consistent with the existing distributed faceting code)
          • adds facet_range support
          • adds some more tests

          ...any comments/concerns?

          (Peter: was there any particular reason you chose to update ResponseBuilder directly instead of using FacetInfo? i want to make sure i'm not overlooking something)

          Show
          Hoss Man added a comment - I finally got a chacne to really look at this today. The deprecation of date faceting is obviously a big factor here – but thanks to the work peter and david already did, i was able to figure out how to enhance the code and tests to also work for the newer generic range faceting (believe it or not, i've never looked at the internals of any of the distributed components until now) Overview of changes in this patch... applies to trunk (but i expect backporting this to 3x should be straight forward) eliminate some warnings by using stronger typing on the generics rename some variables for simpler readability eliminate the new public methods in ResponseBuilder, use new properties in FacetInfo instead (this seems more consistent with the existing distributed faceting code) adds facet_range support adds some more tests ...any comments/concerns? (Peter: was there any particular reason you chose to update ResponseBuilder directly instead of using FacetInfo? i want to make sure i'm not overlooking something)
          Hide
          Bill Bell added a comment -

          1 vote for 3.1

          Show
          Bill Bell added a comment - 1 vote for 3.1
          Hide
          Peter Sturge added a comment -

          Hi David,

          Thank you thank you thank you for working on this and providing tests - your efforts are very much appreciated!

          For deprecation of facet.date, I suspect it probably shouldn't be deprecated until a fully-fledged replacement is ready, ported and committed, but if SOLR-1240 can functionally slot-in (including the 'NOW' stuff in SOLR-1729), that's great.

          Many thanks,
          Peter

          Show
          Peter Sturge added a comment - Hi David, Thank you thank you thank you for working on this and providing tests - your efforts are very much appreciated! For deprecation of facet.date, I suspect it probably shouldn't be deprecated until a fully-fledged replacement is ready, ported and committed, but if SOLR-1240 can functionally slot-in (including the 'NOW' stuff in SOLR-1729 ), that's great. Many thanks, Peter
          Hide
          David Smiley added a comment -

          This is a patch for v3.1. It includes a test. Thanks to Solr's excellent test infrastructure, it was actually very easy to test.

          Now that said, I suspect that facet.date is going to be deprecated in favor of facet.range (which supports dates and numbers) – SOLR-1240. Facet.range does not support faceting yet. I'll take a look at porting this patch and providing a test.

          Show
          David Smiley added a comment - This is a patch for v3.1. It includes a test. Thanks to Solr's excellent test infrastructure, it was actually very easy to test. Now that said, I suspect that facet.date is going to be deprecated in favor of facet.range (which supports dates and numbers) – SOLR-1240 . Facet.range does not support faceting yet. I'll take a look at porting this patch and providing a test.
          Hide
          Peter Sturge added a comment -

          Hi David,

          Yes, at the time my patching wasn't working (Windows env for my sins), so I thought it would be better to make the source available than not. Thomas H. kindly did turn it into a udiff patch last year.
          I agree it would be good to include this functionality (along with SOLR-1729 + Yonik's recent 'NOW' changes).
          I have a product release coming up in a few weeks, so I won't have many cycles before then. Of course it would be great if you have any time to invest making this more 'commitable'.
          I admit because I'm not a Solr commiter, I'm not as familiar with the requirements. If you can let me know the 'missing elements', I'm happy to look at contributing what's needed, or if you prefer, divide up the tasks that need doing.

          Many thanks,
          Peter

          Show
          Peter Sturge added a comment - Hi David, Yes, at the time my patching wasn't working (Windows env for my sins), so I thought it would be better to make the source available than not. Thomas H. kindly did turn it into a udiff patch last year. I agree it would be good to include this functionality (along with SOLR-1729 + Yonik's recent 'NOW' changes). I have a product release coming up in a few weeks, so I won't have many cycles before then. Of course it would be great if you have any time to invest making this more 'commitable'. I admit because I'm not a Solr commiter, I'm not as familiar with the requirements. If you can let me know the 'missing elements', I'm happy to look at contributing what's needed, or if you prefer, divide up the tasks that need doing. Many thanks, Peter
          Hide
          David Smiley added a comment -

          I'm just trying to prod this along so that it can get committed. One problem is that you didn't provide a patch here, you provided whole source files. Long ago in 2010 I carefully figured out what changes you did and patched my Solr trunk. Using the same patch I created for that, I did it by hand for 3x just now. And there are no tests. If you don't have an interest in pushing this along further than I could invest some effort in a couple weeks or so to provide missing elements that should make it amicable to committers. I could also make it work for numeric ranges which looks straight forward.

          Show
          David Smiley added a comment - I'm just trying to prod this along so that it can get committed. One problem is that you didn't provide a patch here, you provided whole source files. Long ago in 2010 I carefully figured out what changes you did and patched my Solr trunk. Using the same patch I created for that, I did it by hand for 3x just now. And there are no tests. If you don't have an interest in pushing this along further than I could invest some effort in a couple weeks or so to provide missing elements that should make it amicable to committers. I could also make it work for numeric ranges which looks straight forward.
          Hide
          Peter Sturge added a comment -

          Hi David,

          I thought this was for the 3x branch, but I still get confused since the all the names changed.
          Are you finding this patch not working on 3x? (remember also to apply SOLR-1729)

          I've not tried this on trunk, as there has been a lot of flux in this area in the recent past. As I've not tried it, I'm not sure how this patch works on trunk - possibly requiring a manual merge.

          Peter

          Show
          Peter Sturge added a comment - Hi David, I thought this was for the 3x branch, but I still get confused since the all the names changed. Are you finding this patch not working on 3x? (remember also to apply SOLR-1729 ) I've not tried this on trunk, as there has been a lot of flux in this area in the recent past. As I've not tried it, I'm not sure how this patch works on trunk - possibly requiring a manual merge. Peter
          Hide
          David Smiley added a comment -

          Peter, do you have any inclination to port this to 3x or trunk which is using the new numerical range faceting for doing double duty for numbers & dates?

          Show
          David Smiley added a comment - Peter, do you have any inclination to port this to 3x or trunk which is using the new numerical range faceting for doing double duty for numbers & dates?
          Hide
          Peter Sturge added a comment -

          It's a good idea to apply SOLR-1729 in any case, as it caters for any
          time skew in documents and between machines.
          Without it, result counts 'on the edges' could be incorrect. 1729 is
          quite 'passive', in that if you don't specify a 'FACET_DATE_NOW'
          parameter int he request, it runs as without the patch.

          In terms of readiness, we've been using these patches in production
          environments for months now. (We use it with the 3.x trunk branch)

          Yonik, et al. were talking about a more general update with regards
          how NOW is configured on a machine (since it is used in places other
          than just date facets), and this is the
          'extra' work to be done, but things work fine as they are for disti
          date faceting.

          Thanks,
          Peter

          Show
          Peter Sturge added a comment - It's a good idea to apply SOLR-1729 in any case, as it caters for any time skew in documents and between machines. Without it, result counts 'on the edges' could be incorrect. 1729 is quite 'passive', in that if you don't specify a 'FACET_DATE_NOW' parameter int he request, it runs as without the patch. In terms of readiness, we've been using these patches in production environments for months now. (We use it with the 3.x trunk branch) Yonik, et al. were talking about a more general update with regards how NOW is configured on a machine (since it is used in places other than just date facets), and this is the 'extra' work to be done, but things work fine as they are for disti date faceting. Thanks, Peter
          Hide
          Peter Karich added a comment -

          Hi Peter,

          sorry for getting so late back.

          I'm relative sure now that I'll need that patch (also Jake from solandra was asking when this patch will be ready )

          So, I will need to apply SOLR-1729 and then this patch to the 3x branch or even without SOLR-1729 (not necessary in my case)?

          Regards,
          Peter.

          Show
          Peter Karich added a comment - Hi Peter, sorry for getting so late back. I'm relative sure now that I'll need that patch (also Jake from solandra was asking when this patch will be ready ) So, I will need to apply SOLR-1729 and then this patch to the 3x branch or even without SOLR-1729 (not necessary in my case)? Regards, Peter.
          Hide
          Peter Sturge added a comment -

          Hi Peter,

          Thanks for your message. There's of course the issue of 'now' as described in some of the above comments. This is perhaps a little ancillary to this issue, but not totally irrelevant.

          The issue of time zone/skew on distributed shards is currently handled by SOLR-1729 by passing a 'facet.date.now=<epochtime>' parameter in the search query. This is then used by the particapating shards to use as 'now'. Of course, there are a number of ways to skin that one, but this is a straightforward solution that is backward compatible and still easy to implement in client code.

          Note that the facet.date.now change is not part of this patch - see SOLR-1729 for a separate patch for this parameter. (kept separate because it's, strictly speaking, a separate issue generally for distributed search)

          It's not that eariler/later aren't supported - the date facet 'edges' are fine, it's just the patch will 'quantize the ends' of the start/end date facets if the time is skewed from the calling server. This is where SOLR-1729 comes into play, so that this doesn't happen.

          As this is a pre-3x/4x branch patch, the testing is a bit limited on the latest trunk(s). Having said that, I have this (and SOLR-1729) building/running fine on my svn 3x branch release copy.
          Any other questions, or info you need, please do let me know.

          Thanks!
          Peter

          Show
          Peter Sturge added a comment - Hi Peter, Thanks for your message. There's of course the issue of 'now' as described in some of the above comments. This is perhaps a little ancillary to this issue, but not totally irrelevant. The issue of time zone/skew on distributed shards is currently handled by SOLR-1729 by passing a 'facet.date.now=<epochtime>' parameter in the search query. This is then used by the particapating shards to use as 'now'. Of course, there are a number of ways to skin that one, but this is a straightforward solution that is backward compatible and still easy to implement in client code. Note that the facet.date.now change is not part of this patch - see SOLR-1729 for a separate patch for this parameter. (kept separate because it's, strictly speaking, a separate issue generally for distributed search) It's not that eariler/later aren't supported - the date facet 'edges' are fine, it's just the patch will 'quantize the ends' of the start/end date facets if the time is skewed from the calling server. This is where SOLR-1729 comes into play, so that this doesn't happen. As this is a pre-3x/4x branch patch, the testing is a bit limited on the latest trunk(s). Having said that, I have this (and SOLR-1729 ) building/running fine on my svn 3x branch release copy. Any other questions, or info you need, please do let me know. Thanks! Peter
          Hide
          Peter Karich added a comment -

          Hi Peter Sturge,

          what are the limitations of this patch? only that earlier + later isn't supported?

          What are the issues before commiting this into trunk?

          Show
          Peter Karich added a comment - Hi Peter Sturge, what are the limitations of this patch? only that earlier + later isn't supported? What are the issues before commiting this into trunk?
          Hide
          Thomas Hammerl added a comment -

          Hi Peter!

          Thanks for your advice! I have simply removed the introduced _termsHelper member variable in ResponseBuilder.java from the patch since it was not used anywhere. Now everything compiles fine with the code base from http://svn.apache.org/repos/asf/lucene/solr/tags/release-1.4.0/. I have attached a patch file in unified diff format.

          Show
          Thomas Hammerl added a comment - Hi Peter! Thanks for your advice! I have simply removed the introduced _termsHelper member variable in ResponseBuilder.java from the patch since it was not used anywhere. Now everything compiles fine with the code base from http://svn.apache.org/repos/asf/lucene/solr/tags/release-1.4.0/ . I have attached a patch file in unified diff format.
          Hide
          Peter Sturge added a comment -

          Hi Thomas,

          Hmmm...TermsHelper is an inner class inside TermsComponent.
          In the code base that I have, this class exists within TermsComponent. I've just had a look on the http://mirrors.dedipower.com/ftp.apache.org/lucene/solr/1.4.0/ mirror, and the TermsComponent doesn't have this inner class.

          Not sure where the difference is, as I would have got my codebase from the same set of mirrors as you (unless some mirrors are out-of-sync?).

          TermsComponent hasn't changed in this patch, so I don't know much about this class. One thing to try is to diff the 2 files above with your 1.4 codebase, and merge the changes into your codebase. The differences should be very easy to see.

          This does highlight the very good policy for putting patch files as attachments rather than source files. This is my fault, as we don't use svn in our (win) environment, and Tortoise SVN crashes explorer64, so i'm not able to make compatible diff files - sorry.

          If you do create a couple of diff files, it would be very kind of you if you could post it up on this issue for others?

          Thanks!

          Show
          Peter Sturge added a comment - Hi Thomas, Hmmm...TermsHelper is an inner class inside TermsComponent. In the code base that I have, this class exists within TermsComponent. I've just had a look on the http://mirrors.dedipower.com/ftp.apache.org/lucene/solr/1.4.0/ mirror, and the TermsComponent doesn't have this inner class. Not sure where the difference is, as I would have got my codebase from the same set of mirrors as you (unless some mirrors are out-of-sync?). TermsComponent hasn't changed in this patch, so I don't know much about this class. One thing to try is to diff the 2 files above with your 1.4 codebase, and merge the changes into your codebase. The differences should be very easy to see. This does highlight the very good policy for putting patch files as attachments rather than source files. This is my fault, as we don't use svn in our (win) environment, and Tortoise SVN crashes explorer64, so i'm not able to make compatible diff files - sorry. If you do create a couple of diff files, it would be very kind of you if you could post it up on this issue for others? Thanks!
          Hide
          Thomas Hammerl added a comment -

          Unfortunately, I am not able to apply this patch. I get the following compile error:

          [javac] /home/systemone/Desktop/solr-1.4/src/java/org/apache/solr/handler/component/ResponseBuilder.java:138: cannot find symbol
          [javac] symbol  : class TermsHelper
          [javac] location: class org.apache.solr.handler.component.TermsComponent
          [javac]   TermsComponent.TermsHelper _termsHelper;
          [javac]
          

          What I've done basically is download the attached sources and apply the following commands:

          svn co http://svn.apache.org/repos/asf/lucene/solr/tags/release-1.4.0/ solr-1.4.0
          cp FacetComponent.java solr-1.4.0/src/java/org/apache/solr/handler/component/
          cp ResponseBuilder.java solr-1.4.0/src/java/org/apache/solr/handler/component/
          cd solr-1.4.0
          ant package
          

          I also tried to apply the patch to the 1.4 branch at http://svn.apache.org/repos/asf/lucene/solr/branches/branch-1.4/ resulting in the same compile error.

          Any help would be very appreciated.

          Show
          Thomas Hammerl added a comment - Unfortunately, I am not able to apply this patch. I get the following compile error: [javac] /home/systemone/Desktop/solr-1.4/src/java/org/apache/solr/handler/component/ResponseBuilder.java:138: cannot find symbol [javac] symbol : class TermsHelper [javac] location: class org.apache.solr.handler.component.TermsComponent [javac] TermsComponent.TermsHelper _termsHelper; [javac] What I've done basically is download the attached sources and apply the following commands: svn co http://svn.apache.org/repos/asf/lucene/solr/tags/release-1.4.0/ solr-1.4.0 cp FacetComponent.java solr-1.4.0/src/java/org/apache/solr/handler/component/ cp ResponseBuilder.java solr-1.4.0/src/java/org/apache/solr/handler/component/ cd solr-1.4.0 ant package I also tried to apply the patch to the 1.4 branch at http://svn.apache.org/repos/asf/lucene/solr/branches/branch-1.4/ resulting in the same compile error. Any help would be very appreciated.
          Hide
          Peter Sturge added a comment -

          Updated version of FacetComponent.java after more testing and sync with FacetParams.FACET_DATE_NOW (see SOLR-1729).
          For use with the 1.4 trunk (along with the existing ResponseBuilder.java in this patch).

          Show
          Peter Sturge added a comment - Updated version of FacetComponent.java after more testing and sync with FacetParams.FACET_DATE_NOW (see SOLR-1729 ). For use with the 1.4 trunk (along with the existing ResponseBuilder.java in this patch).
          Hide
          Peter Sturge added a comment -

          Yonik,

          Yes, I can see what you mean that of course NOW will affect anything date-related to a given query.
          I'm wondering whether the passing of 'NOW' to shards should be a separate issue/patch from this one (e.g. something like 'Time Sync to Remote Shards'), as its scope and ramifications go far beyond simply distributed date faceting.
          The whole area of code relating to date math is one that I'm not familiar with, but do let me know if there's anything you'd like me to look at.

          Show
          Peter Sturge added a comment - Yonik, Yes, I can see what you mean that of course NOW will affect anything date-related to a given query. I'm wondering whether the passing of 'NOW' to shards should be a separate issue/patch from this one (e.g. something like 'Time Sync to Remote Shards'), as its scope and ramifications go far beyond simply distributed date faceting. The whole area of code relating to date math is one that I'm not familiar with, but do let me know if there's anything you'd like me to look at.
          Hide
          Yonik Seeley added a comment -

          Seems useful enough that setting NOW should be advertised (i.e. not just an internal call). For example, it would be a convenient way to keep the rest of your request the same, but check how the current date affects your date boosting strategies. NOW isn't just for date faceting, but for anything that uses date math.

          As for the format, 20091231 is ambiguous if you want flexible dates... is it a date or milliseconds?
          I first thought of a prefix (ms:123456789) but it makes it look like a field query.
          It might be safest to make it unambiguous somehow... postfix with ms? 123456789ms

          Show
          Yonik Seeley added a comment - Seems useful enough that setting NOW should be advertised (i.e. not just an internal call). For example, it would be a convenient way to keep the rest of your request the same, but check how the current date affects your date boosting strategies. NOW isn't just for date faceting, but for anything that uses date math. As for the format, 20091231 is ambiguous if you want flexible dates... is it a date or milliseconds? I first thought of a prefix (ms:123456789) but it makes it look like a field query. It might be safest to make it unambiguous somehow... postfix with ms? 123456789ms
          Hide
          Peter Sturge added a comment -

          Definitely true! – messing about with Date strings isn't great for performance.

          As the NOW parameter would be for internal request use only (i.e. not for the indexer, not for human consumption), could it not just be an epoch long? The adjustment math should then be nice and quick (no string/date parsing/formatting; at worst just one Date.getTimeInMillis() call if the time is stored locally as a string).

          Show
          Peter Sturge added a comment - Definitely true! – messing about with Date strings isn't great for performance. As the NOW parameter would be for internal request use only (i.e. not for the indexer, not for human consumption), could it not just be an epoch long? The adjustment math should then be nice and quick (no string/date parsing/formatting; at worst just one Date.getTimeInMillis() call if the time is stored locally as a string).
          Hide
          Yonik Seeley added a comment -

          Date formatting and parsing also tend to be surprisingly expensive.
          So if we support passing NOW as a date string, it would be nice to also support standard milliseconds. That can also be easier for clients to generate rather than trying to figure out how to get the correct date format. Perhaps that should even be an addition to the standard datemath syntax.

          Show
          Yonik Seeley added a comment - Date formatting and parsing also tend to be surprisingly expensive. So if we support passing NOW as a date string, it would be nice to also support standard milliseconds. That can also be easier for clients to generate rather than trying to figure out how to get the correct date format. Perhaps that should even be an addition to the standard datemath syntax.
          Hide
          Yonik Seeley added a comment -

          I haven't checked the patch, but it seems like we should take a generic approach to NOW...
          The first time NOW is used anywhere in the request (and is not passed in as a request argument), either a thread local or something in the request context should be set to the current time. Subsequent references to NOW would yield the first value set.
          This would allow NOW to be referenced more than once in the same request with consistent results.

          Passing in "NOW" as a request parameter would simply set it explicitly... the question is, who (which solr component) should be responsible for that?

          Show
          Yonik Seeley added a comment - I haven't checked the patch, but it seems like we should take a generic approach to NOW... The first time NOW is used anywhere in the request (and is not passed in as a request argument), either a thread local or something in the request context should be set to the current time. Subsequent references to NOW would yield the first value set. This would allow NOW to be referenced more than once in the same request with consistent results. Passing in "NOW" as a request parameter would simply set it explicitly... the question is, who (which solr component) should be responsible for that?
          Hide
          Hoss Man added a comment -

          Requesters would include an optional parameter that told remote shards what time to use as 'NOW', and which TZ to use for date faceting. This would avoid having to translate loads of time strings at merge time.

          I was thinking the same thing ... as long as the "coordinator" evaluated any DateMath in the facet.date.start and facet.date.end params before executing the sub-requests to the shards, the ranges coming back from the individual shards should all be in sync.

          Show
          Hoss Man added a comment - Requesters would include an optional parameter that told remote shards what time to use as 'NOW', and which TZ to use for date faceting. This would avoid having to translate loads of time strings at merge time. I was thinking the same thing ... as long as the "coordinator" evaluated any DateMath in the facet.date.start and facet.date.end params before executing the sub-requests to the shards, the ranges coming back from the individual shards should all be in sync.
          Hide
          Peter Sturge added a comment -

          Sorry, guys, can't get svn to create a patch file correctly on windows, so I'm attaching the source files here. With some time, which at the moment I don't have, I'm sure I could get svn working. Rather than anyone have to wait for me to get the patch file created, I thought it best to get the source uploaded, so people can start using it.
          Thanks, Peter

          Show
          Peter Sturge added a comment - Sorry, guys, can't get svn to create a patch file correctly on windows, so I'm attaching the source files here. With some time, which at the moment I don't have, I'm sure I could get svn working. Rather than anyone have to wait for me to get the patch file created, I thought it best to get the source uploaded, so people can start using it. Thanks, Peter
          Hide
          Peter Sturge added a comment -

          I've heard of Tortoise, I'll give that a try, thanks.

          On the time-zone/skew issue, perhaps a more efficient approach would be a 'push' rather than 'pull' - i.e.:

          Requesters would include an optional parameter that told remote shards what time to use as 'NOW', and which TZ to use for date faceting.
          This would avoid having to translate loads of time strings at merge time.

          Thanks,
          Peter

          Show
          Peter Sturge added a comment - I've heard of Tortoise, I'll give that a try, thanks. On the time-zone/skew issue, perhaps a more efficient approach would be a 'push' rather than 'pull' - i.e.: Requesters would include an optional parameter that told remote shards what time to use as 'NOW', and which TZ to use for date faceting. This would avoid having to translate loads of time strings at merge time. Thanks, Peter
          Hide
          Jason Rutherglen added a comment -

          Tim,

          Thanks for the patch...

          as I'm having a bit of trouble with svn (don't shoot me, but my environment is a Redmond-based os company).

          TortoiseSVN works well on Windows, even for creating patches. Have you tried it?

          Show
          Jason Rutherglen added a comment - Tim, Thanks for the patch... as I'm having a bit of trouble with svn (don't shoot me, but my environment is a Redmond-based os company). TortoiseSVN works well on Windows, even for creating patches. Have you tried it?

            People

            • Assignee:
              Simon Willnauer
              Reporter:
              Peter Sturge
            • Votes:
              2 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development