Solr
  1. Solr
  2. SOLR-397

options for dealing with range endpoints in date facets

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 3.1, 4.0-ALPHA
    • Component/s: None
    • Labels:
      None

      Description

      Date faceting should support configuration for controlling how edge boundaries are dealt with.

      1. SOLR-397.patch
        25 kB
        Hoss Man
      2. SOLR-397.patch
        24 kB
        Hoss Man

        Issue Links

          Activity

          Hide
          Hoss Man added a comment -

          (initial issue description moved to comment)

          as discussed in email...
          http://www.nabble.com/Re%3A-Date-facetting-and-ranges-overlapping-p12928374.html

          : I'm now using date facetting to browse events. It works really fine
          : and is really useful. The only problem so far is that if I have an
          : event which is exactly on the boundary of two ranges, it is referenced
          : 2 times.

          yeah, this is one of the big caveats with date faceting right now ... i
          struggled with this a bit when designing it, and ultimately decided to
          punt on the issue. the biggest hangup was that even if hte facet counting
          code was smart about making sure the ranges don't overlap, the range query
          syntax in the QueryParser doesn't support ranges that exclude one input
          (so there wouldn't be a lot you can do with the ranges once you know the
          counts in them)

          one idea i had in SOLR-258 was that we could add an "interval" option that
          would define how much to add to the "end" or one range to get the "start"
          of another range (think of the current implementation having interval
          hardcoded to "0") which would solve the problem and work with range
          queries that were inclusive of both endpoints, but would require people to
          use "-1MILLI" a lot.

          a better option (assuming a query parser change) would be a new option
          thta says wether each computed range should be enclusive of the low poin,t
          the high point, both end points, neither end points, or be "smart" (where
          smart is the same as "low" except for the last range where the it includes
          both)

          (I think there's already a lucene issue to add the query parser support, i
          just haven't had time to look at it)

          The simple workarround: if you know all of your data is indexed with
          perfect 0.000second precision, then put "-1MILLI" at the end of your start
          and end date faceting params.

          Show
          Hoss Man added a comment - (initial issue description moved to comment) as discussed in email... http://www.nabble.com/Re%3A-Date-facetting-and-ranges-overlapping-p12928374.html : I'm now using date facetting to browse events. It works really fine : and is really useful. The only problem so far is that if I have an : event which is exactly on the boundary of two ranges, it is referenced : 2 times. yeah, this is one of the big caveats with date faceting right now ... i struggled with this a bit when designing it, and ultimately decided to punt on the issue. the biggest hangup was that even if hte facet counting code was smart about making sure the ranges don't overlap, the range query syntax in the QueryParser doesn't support ranges that exclude one input (so there wouldn't be a lot you can do with the ranges once you know the counts in them) one idea i had in SOLR-258 was that we could add an "interval" option that would define how much to add to the "end" or one range to get the "start" of another range (think of the current implementation having interval hardcoded to "0") which would solve the problem and work with range queries that were inclusive of both endpoints, but would require people to use "-1MILLI" a lot. a better option (assuming a query parser change) would be a new option thta says wether each computed range should be enclusive of the low poin,t the high point, both end points, neither end points, or be "smart" (where smart is the same as "low" except for the last range where the it includes both) (I think there's already a lucene issue to add the query parser support, i just haven't had time to look at it) The simple workarround: if you know all of your data is indexed with perfect 0.000second precision, then put "-1MILLI" at the end of your start and end date faceting params.
          Hide
          Hoss Man added a comment -

          Additional idea that i like much better then the "interval" idea i had a while back, transcribed from email so it's not lost to the ages...

          I think the semantics that might make the most sense is to add a
          multivalued "facet.date.include" param that supports the following
          options: all, lower, upper, edge, outer

          • "all" is shorthand for lower,upper,edge,outer and is the default (for back compat)
          • if "lower" is specified, then all ranges include their lower bound
          • if "upper" is specified, then all ranges include their upper bound
          • if "edge" is specified, then the first and last ranges include their edge bounds (ie: lower for the first one, upper for the last one) even if the corrisponding "upper"/"lower" option is not specified.
          • the "between" count is inclusive of each of the start and end bounds iff the first and last range are inclusive of them
          • the "before" and "after" ranges are inclusive of their respective bounds if:
            • "outer" is specified ... OR ...
            • the first and last ranges don't already include them

          so assuming you started with something like (specific dates and durrations shortend for readability)...

          facet.date.start=1 & facet.date.end=3 & facet.date.gap=+1 & facet.date.other=all

          ...your ranges would be...

          [1 TO 2], [2 TO 3] and [* TO 1], [1 TO 3], [3 TO *]

          The following params would change the ranges in the following ways...

          w/ facet.date.include=lower ...
            [1 TO 2}, [2 TO 3} and [* TO 1}, [1 TO 3}, [3 TO *]
          
          w/facet.date.include=upper ...
            {1 TO 2], {2 TO 3] and [* TO 1], {1 TO 3], {3 TO *]
          
          w/ facet.date.include=lower&facet.date.include=edge ...
            [1 TO 2}, [2 TO 3] and [* TO 1}, [1 TO 3], {3 TO *]
          
          w/ facet.date.include=upper&facet.date.include=edge ...
            [1 TO 2], {2 TO 3] and [* TO 1}, [1 TO 3], {3 TO *]
          
          w/ facet.date.include=upper&facet.date.include=outer ...
            {1 TO 2], {2 TO 3] and [* TO 1], {1 TO 3], [3 TO *]
          
          ...etc.
          

          initial proposal: http://old.nabble.com/RE%3A-Date-Facet-duplicate-counts-p27331578.html

          Show
          Hoss Man added a comment - Additional idea that i like much better then the "interval" idea i had a while back, transcribed from email so it's not lost to the ages... I think the semantics that might make the most sense is to add a multivalued "facet.date.include" param that supports the following options: all, lower, upper, edge, outer "all" is shorthand for lower,upper,edge,outer and is the default (for back compat) if "lower" is specified, then all ranges include their lower bound if "upper" is specified, then all ranges include their upper bound if "edge" is specified, then the first and last ranges include their edge bounds (ie: lower for the first one, upper for the last one) even if the corrisponding "upper"/"lower" option is not specified. the "between" count is inclusive of each of the start and end bounds iff the first and last range are inclusive of them the "before" and "after" ranges are inclusive of their respective bounds if: "outer" is specified ... OR ... the first and last ranges don't already include them so assuming you started with something like (specific dates and durrations shortend for readability)... facet.date.start=1 & facet.date.end=3 & facet.date.gap=+1 & facet.date.other=all ...your ranges would be... [1 TO 2] , [2 TO 3] and [* TO 1] , [1 TO 3] , [3 TO *] The following params would change the ranges in the following ways... w/ facet.date.include=lower ... [1 TO 2}, [2 TO 3} and [* TO 1}, [1 TO 3}, [3 TO *] w/facet.date.include=upper ... {1 TO 2], {2 TO 3] and [* TO 1], {1 TO 3], {3 TO *] w/ facet.date.include=lower&facet.date.include=edge ... [1 TO 2}, [2 TO 3] and [* TO 1}, [1 TO 3], {3 TO *] w/ facet.date.include=upper&facet.date.include=edge ... [1 TO 2], {2 TO 3] and [* TO 1}, [1 TO 3], {3 TO *] w/ facet.date.include=upper&facet.date.include= outer ... {1 TO 2], {2 TO 3] and [* TO 1], {1 TO 3], [3 TO *] ...etc. initial proposal: http://old.nabble.com/RE%3A-Date-Facet-duplicate-counts-p27331578.html
          Hide
          Hoss Man added a comment -

          Spurred on by recent email threads about this, i sat down and got the previously mentioned design working (with tests!)

          this patch implements the "facet.date.include" param mentioned about with the specified semantics. the only change is that i discovered facet.date.other=before and facet.date.other=after don't currently included the start/end (respectively) range boundaries ... so i made the default for facet.date.include be [lower,upper,edge] for back compatibility.

          I think this approach makes more sense then the proposal in SOLR-1402 because these semantics make it easy to always get a series of ranges (including the before/after ranges) that are "adjacent" w/o overlapping (using either [lower], [lower,edge], [upper], or [upper,edge])

          Show
          Hoss Man added a comment - Spurred on by recent email threads about this, i sat down and got the previously mentioned design working (with tests!) this patch implements the "facet.date.include" param mentioned about with the specified semantics. the only change is that i discovered facet.date.other=before and facet.date.other=after don't currently included the start/end (respectively) range boundaries ... so i made the default for facet.date.include be [lower,upper,edge] for back compatibility. I think this approach makes more sense then the proposal in SOLR-1402 because these semantics make it easy to always get a series of ranges (including the before/after ranges) that are "adjacent" w/o overlapping (using either [lower] , [lower,edge] , [upper] , or [upper,edge] )
          Hide
          Hoss Man added a comment -

          updated patch to reflect some changes yonik committed to the test

          i plan to commit soon .. my only hesitation is the name of the options, but those cna be changed easily.

          Show
          Hoss Man added a comment - updated patch to reflect some changes yonik committed to the test i plan to commit soon .. my only hesitation is the name of the options, but those cna be changed easily.
          Hide
          Hoss Man added a comment -

          Committed revision 940556.

          Show
          Hoss Man added a comment - Committed revision 940556.
          Hide
          Hoss Man added a comment -

          Correcting Fix Version based on CHANGES.txt, see this thread for more details...

          http://mail-archives.apache.org/mod_mbox/lucene-dev/201005.mbox/%3Calpine.DEB.1.10.1005251052040.24672@radix.cryptio.net%3E

          Show
          Hoss Man added a comment - Correcting Fix Version based on CHANGES.txt, see this thread for more details... http://mail-archives.apache.org/mod_mbox/lucene-dev/201005.mbox/%3Calpine.DEB.1.10.1005251052040.24672@radix.cryptio.net%3E
          Hide
          Hoss Man added a comment -

          reopening to track backport to 3x

          Show
          Hoss Man added a comment - reopening to track backport to 3x
          Hide
          Hoss Man added a comment -

          Committed revision 980604. - merged to 3x branch

          Show
          Hoss Man added a comment - Committed revision 980604. - merged to 3x branch
          Hide
          Grant Ingersoll added a comment -

          Bulk close for 3.1.0 release

          Show
          Grant Ingersoll added a comment - Bulk close for 3.1.0 release

            People

            • Assignee:
              Hoss Man
              Reporter:
              Hoss Man
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development