Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 3.1, 4.0-ALPHA
    • Component/s: search
    • Labels:
      None

      Description

      For faceting numerical ranges using many facet.query query arguments leads to unmanageably large queries as the fields you facet over increase. Adding the same faceting parameter for numbers which already exists for dates should fix this.

      1. SOLR-1240.include-default-lower.patch
        7 kB
        Hoss Man
      2. SOLR-1240.patch
        71 kB
        Hoss Man
      3. SOLR-1240.patch
        71 kB
        Hoss Man
      4. SOLR-1240.patch
        71 kB
        Hoss Man
      5. SOLR-1240.patch
        41 kB
        Hoss Man
      6. SOLR-1240.patch
        33 kB
        Hoss Man
      7. SOLR-1240.patch
        23 kB
        Hoss Man
      8. SOLR-1240.patch
        12 kB
        Hoss Man
      9. SOLR-1240.patch
        11 kB
        Gijs Kunze
      10. SOLR-1240.patch
        12 kB
        Gijs Kunze
      11. SOLR-1240.use-nl.patch
        0.6 kB
        Hoss Man

        Issue Links

          Activity

          Hide
          Gijs Kunze added a comment -

          My first try, my Java skill are not that great but it seems to work.

          Test org.apache.solr.TestDistributedSearch seems to fail with this patch, I couldn't find out why exactly, my dev environment is not setup for Java debugging. I'm hoping it's simply because an extra lst tag is included in the facet results.

          Show
          Gijs Kunze added a comment - My first try, my Java skill are not that great but it seems to work. Test org.apache.solr.TestDistributedSearch seems to fail with this patch, I couldn't find out why exactly, my dev environment is not setup for Java debugging. I'm hoping it's simply because an extra lst tag is included in the facet results.
          Hide
          Erik Hatcher added a comment -

          Let's please stop adding stuff to SimpleFacets. See SOLR-792 for an approach of adding additional faceting components instead of overloading the now not-so-SimpleFacets.

          Show
          Erik Hatcher added a comment - Let's please stop adding stuff to SimpleFacets. See SOLR-792 for an approach of adding additional faceting components instead of overloading the now not-so-SimpleFacets.
          Hide
          Gijs Kunze added a comment - - edited

          I'd love to add it as a separate component, maybe moving date faceting to the same component since date faceting and numerical faceting are practically identical. But as I said, I'm not really that proficient in Java, haven't done any real Java programming since Swing was still new. So I'm not really sure I'm up to the task.

          Looking at SOLR-792 as an example I looked at converting my patch but I think some re-factoring on the side of SimpleFacets might be in order first. My patch makes use of the parseParams method (which handles tagging/exclusion local parameters) and I can't see how to cleanly make use of that functionality from a separate component. I could copy the method and the member variables it manipulates to the component but I'd rather not perpetrate that code maintenance no-no.

          Show
          Gijs Kunze added a comment - - edited I'd love to add it as a separate component, maybe moving date faceting to the same component since date faceting and numerical faceting are practically identical. But as I said, I'm not really that proficient in Java, haven't done any real Java programming since Swing was still new. So I'm not really sure I'm up to the task. Looking at SOLR-792 as an example I looked at converting my patch but I think some re-factoring on the side of SimpleFacets might be in order first. My patch makes use of the parseParams method (which handles tagging/exclusion local parameters) and I can't see how to cleanly make use of that functionality from a separate component. I could copy the method and the member variables it manipulates to the component but I'd rather not perpetrate that code maintenance no-no.
          Hide
          Gijs Kunze added a comment -

          Solr 1.4 version of Numerical Range Faceting attached. Works (only) on TrieFields now.

          Show
          Gijs Kunze added a comment - Solr 1.4 version of Numerical Range Faceting attached. Works (only) on TrieFields now.
          Hide
          Hoss Man added a comment -

          Updated version of Gijs previous patch that applies cleanly to trunk

          Misc comments on the patch...

          • i agree with erik, we should split up SimpleFacets and FacetComponent, but that's a seperate probelm. if the patch works well, let's commit and then refactor – besides: there is enough similarity with date faceting that they should stay together and share code anyway, so let's just put number faceting with date faceting now and move them together later when we split up facet.query, facet.field and facet.date/number
            • I think in an ideal world: we could eliminate facet.date and facet.number and just let people use "facet.range=fieldName" along with legitimate values for the other facet.range.* params and have it do the right thing depending on the type of fieldName ... but it might take a little while to get there
          • Note: current patch breaks distributed test, but i think that's just a foolish assumption on the part of the test.
          • functionality currently breaks if any of the numeric params is a float value – the code does lots of special testing to see if the field is a "float" field or not, but it still tries to parse every param as both a long and a float – we need to fix that so any number is valid (even if the field type is "long" it's totally reasonable for people to ask for a gap of "0.5")
          • the facet.date.include param was added after this patch was contributed, this patch uses it's own param/model for dealing with inclusion/exclusion – it should be changed to match the conventions dates uses
          Show
          Hoss Man added a comment - Updated version of Gijs previous patch that applies cleanly to trunk Misc comments on the patch... i agree with erik, we should split up SimpleFacets and FacetComponent, but that's a seperate probelm. if the patch works well, let's commit and then refactor – besides: there is enough similarity with date faceting that they should stay together and share code anyway, so let's just put number faceting with date faceting now and move them together later when we split up facet.query, facet.field and facet.date/number I think in an ideal world: we could eliminate facet.date and facet.number and just let people use "facet.range=fieldName" along with legitimate values for the other facet.range.* params and have it do the right thing depending on the type of fieldName ... but it might take a little while to get there Note: current patch breaks distributed test, but i think that's just a foolish assumption on the part of the test. functionality currently breaks if any of the numeric params is a float value – the code does lots of special testing to see if the field is a "float" field or not, but it still tries to parse every param as both a long and a float – we need to fix that so any number is valid (even if the field type is "long" it's totally reasonable for people to ask for a gap of "0.5") the facet.date.include param was added after this patch was contributed, this patch uses it's own param/model for dealing with inclusion/exclusion – it should be changed to match the conventions dates uses
          Hide
          Hoss Man added a comment -

          Updated Patch...

          • changed numeric faceting to use "facet.number.include" param with the same semantics as "facet.date.include". To make this cleaner i renamed FacetParams.FacetDateInclude to the more generic FacetParams.FacetRangeInclude (it has never been included in a release)
          • likewise renamed FacetParams.FacetNumberOther to FacetParams.FacetRangeOther and deprecated FacetParams.FacetDateOther, changing the usages in SimpleFacets to the new more generic enumeration.
          • fixed numeric faceting so that it supports facet.mincount (didn't notice it was missing that before)
          • fixed the bug that was causeing TestDistributed to fail (FacetComponent needed to add an empty "facet_number" block for consistency ... at least until we add support for distributed numeric faceting)
          Show
          Hoss Man added a comment - Updated Patch... changed numeric faceting to use "facet.number.include" param with the same semantics as "facet.date.include". To make this cleaner i renamed FacetParams.FacetDateInclude to the more generic FacetParams.FacetRangeInclude (it has never been included in a release) likewise renamed FacetParams.FacetNumberOther to FacetParams.FacetRangeOther and deprecated FacetParams.FacetDateOther, changing the usages in SimpleFacets to the new more generic enumeration. fixed numeric faceting so that it supports facet.mincount (didn't notice it was missing that before) fixed the bug that was causeing TestDistributed to fail (FacetComponent needed to add an empty "facet_number" block for consistency ... at least until we add support for distributed numeric faceting)
          Hide
          Hoss Man added a comment -

          side comment: i think we should rename all of these new params "facet.range*" instead of "facet.number*" ... even if we can't merge the code and make it autodetecd dates using the word "range" to describe the type of faceting seems more clear then "number"

          Show
          Hoss Man added a comment - side comment: i think we should rename all of these new params "facet.range*" instead of "facet.number*" ... even if we can't merge the code and make it autodetecd dates using the word "range" to describe the type of faceting seems more clear then "number"
          Hide
          Hoss Man added a comment -

          Checkoint, two main changes over the previous patch:

          1) before changing anything else, i went ahead and added some tests using a TrieFloat field .. this is ripe for refactoring so that it will work for double as well, but we also need some decent int/long testing to.

          2) renamed from "number" facets to "range" facets (so all the params now start with "facet.range...." instead of "facet.number..."

          My next step is to fix the problem with not being able to specify decimal values in params because every arg is also parsed as a Long. MY idea is to refactor things so we use a new "StrNumber" type class that knows about the FieldType and hides the details of the underlying type – so the logic for looping over teh ranges can just think in terms of adding "Strings"

          If it works out, this refactoring should be able to encapsulate the existing date faceting as well. (and probably support range faceting on legacy Sortable_Field types)

          (Gijs: have you had a change to review any of these changes to your original patch? I'd love to know what you think)

          Show
          Hoss Man added a comment - Checkoint, two main changes over the previous patch: 1) before changing anything else, i went ahead and added some tests using a TrieFloat field .. this is ripe for refactoring so that it will work for double as well, but we also need some decent int/long testing to. 2) renamed from "number" facets to "range" facets (so all the params now start with "facet.range...." instead of "facet.number..." My next step is to fix the problem with not being able to specify decimal values in params because every arg is also parsed as a Long. MY idea is to refactor things so we use a new "StrNumber" type class that knows about the FieldType and hides the details of the underlying type – so the logic for looping over teh ranges can just think in terms of adding "Strings" If it works out, this refactoring should be able to encapsulate the existing date faceting as well. (and probably support range faceting on legacy Sortable_Field types) (Gijs: have you had a change to review any of these changes to your original patch? I'd love to know what you think)
          Hide
          Gijs Kunze added a comment -

          Hoss, I haven't taken a very good look yet but from what I've read in the comments I think all the changes are good ones.

          I'm currently working on a pretty strict deadline but afterwards I'll see on testing your patch with our project and see if I run into any issues (which I doubt).

          Show
          Gijs Kunze added a comment - Hoss, I haven't taken a very good look yet but from what I've read in the comments I think all the changes are good ones. I'm currently working on a pretty strict deadline but afterwards I'll see on testing your patch with our project and see if I run into any issues (which I doubt).
          Hide
          Hoss Man added a comment -

          Checkpoint, updated patch...

          • Works for all TrieField types (ie: solved the float parse
            problem) by using delegation to a new RangeEndpointCalculator
            abstraction. we pick the Impl based on the field type, and then use
            it fo all the math and comparisons. (this is what my previous
            "StrNumber" idea morphed into once i started working on it .. seemed
            simpler to understand)
          • added tests for double/long/int ranges

          Note regarding one of my previous comments...

          (even if the field type is "long" it's totally reasonable for people to ask for a gap of "0.5")

          this is not supported by my updated patch .. the more i started
          thinking about it the less it made sense – it would only be useful to
          let people do specify a "float" gap on a "long" field if the field
          types would then let you create range queries that would "do the right
          thing" when the end points where fractional – and that's just not the
          case. so for now the "gap" values have to be parsable using the same
          type as the field – but i left flexibility in there so that it
          could be added later (this was largely in anticipation of directly
          supporting Dates where the "gap" is a DateMath string)

          Show
          Hoss Man added a comment - Checkpoint, updated patch... Works for all TrieField types (ie: solved the float parse problem) by using delegation to a new RangeEndpointCalculator abstraction. we pick the Impl based on the field type, and then use it fo all the math and comparisons. (this is what my previous "StrNumber" idea morphed into once i started working on it .. seemed simpler to understand) added tests for double/long/int ranges Note regarding one of my previous comments... (even if the field type is "long" it's totally reasonable for people to ask for a gap of "0.5") this is not supported by my updated patch .. the more i started thinking about it the less it made sense – it would only be useful to let people do specify a "float" gap on a "long" field if the field types would then let you create range queries that would "do the right thing" when the end points where fractional – and that's just not the case. so for now the "gap" values have to be parsable using the same type as the field – but i left flexibility in there so that it could be added later (this was largely in anticipation of directly supporting Dates where the "gap" is a DateMath string)
          Hide
          Hoss Man added a comment -

          Updated patch. I think this may be ready to commit.

          Changes to previous patch...

          • fixed a bug in one test where i was ignoring the fieldName
          • move "metadata" info a new sub NamedList called "meta" - something that was kinda overlooked with date faceting was the co-mingling of facet counts (named after the low point of the range) with other metadata – which is extra ugly now that things like 'gap' and 'end' can also be of type "int". since this facet_range section is all new it doesn't need to have the exact same structure as facet_date, so i think we should clean it up and keep the "range" counts distinct from the other metadata
          • made it support all of the Sortable(Number)Field types, refactored the tests to verify this.
          • make it support Date fields directly – this means you can use facet.range=dateFieldName&facet.range.start=NOW/MONTH&facet.range.gap=... and the "date" type faceting will show up in the "facet_ranges" section. the old "facet.date" stuff still works – but all that code is duplicated, and not shared (yet)
          • refactored existing date faceting tests to both facet.range and facet.date usage (which was kind of a pain combined with the "meta" change – but it works)

          feedback would definitely be appreciated

          Show
          Hoss Man added a comment - Updated patch. I think this may be ready to commit. Changes to previous patch... fixed a bug in one test where i was ignoring the fieldName move "metadata" info a new sub NamedList called "meta" - something that was kinda overlooked with date faceting was the co-mingling of facet counts (named after the low point of the range) with other metadata – which is extra ugly now that things like 'gap' and 'end' can also be of type "int". since this facet_range section is all new it doesn't need to have the exact same structure as facet_date, so i think we should clean it up and keep the "range" counts distinct from the other metadata made it support all of the Sortable(Number)Field types, refactored the tests to verify this. make it support Date fields directly – this means you can use facet.range=dateFieldName&facet.range.start=NOW/MONTH&facet.range.gap=... and the "date" type faceting will show up in the "facet_ranges" section. the old "facet.date" stuff still works – but all that code is duplicated, and not shared (yet) refactored existing date faceting tests to both facet.range and facet.date usage (which was kind of a pain combined with the "meta" change – but it works) feedback would definitely be appreciated
          Hide
          Hoss Man added a comment -

          Using the example data, here's an example of a monster query that uses facet.date and facet.range on the same field with the same start/end/gap as well as doing facet.range on the (float) price and (ing) popularity fields ...

          http://localhost:8983/solr/select/?wt=xml&indent=true&echoParams=none&q=*:*&rows=0&facet.range=price&f.price.facet.range.start=-2.3&f.price.facet.range.end=145.66&f.price.facet.range.gap=13.7432&facet.date=manufacturedate_dt&facet.date.start=NOW/YEAR-5YEARS&facet.date.end=NOW/YEAR%2B1DAY&facet.date.gap=%2B1YEAR&facet=true&facet.range.other=all&facet.range=manufacturedate_dt&f.manufacturedate_dt.facet.range.start=NOW/YEAR-5YEARS&f.manufacturedate_dt.facet.range.end=NOW/YEAR%2B1YEAR&f.manufacturedate_dt.facet.range.gap=%2B1YEAR&facet.range=popularity&facet.range.start=-2&facet.range.gap=3&facet.range.end=11

          <?xml version="1.0" encoding="UTF-8"?>
          <response>
          
          <lst name="responseHeader">
            <int name="status">0</int>
            <int name="QTime">5</int>
          </lst>
          <result name="response" numFound="19" start="0"/>
          <lst name="facet_counts">
            <lst name="facet_queries"/>
            <lst name="facet_fields"/>
          
            <lst name="facet_dates">
              <lst name="manufacturedate_dt">
                <int name="2005-01-01T00:00:00Z">2</int>
                <int name="2006-01-01T00:00:00Z">9</int>
                <int name="2007-01-01T00:00:00Z">0</int>
                <int name="2008-01-01T00:00:00Z">0</int>
                <int name="2009-01-01T00:00:00Z">0</int>
          
                <int name="2010-01-01T00:00:00Z">0</int>
                <str name="gap">+1YEAR</str>
                <date name="end">2011-01-01T00:00:00Z</date>
              </lst>
            </lst>
            <lst name="facet_ranges">
              <lst name="price">
          
                <lst name="meta">
                  <float name="gap">13.7432</float>
                  <float name="end">148.87521</float>
                  <int name="before">0</int>
                  <int name="after">10</int>
                  <int name="between">7</int>
          
                </lst>
                <int name="-2.3">2</int>
                <int name="11.4432">2</int>
                <int name="25.186401">0</int>
                <int name="38.929604">0</int>
                <int name="52.672806">0</int>
          
                <int name="66.41601">2</int>
                <int name="80.15921">1</int>
                <int name="93.90241">0</int>
                <int name="107.645615">0</int>
                <int name="121.38882">0</int>
                <int name="135.13202">0</int>
          
              </lst>
              <lst name="manufacturedate_dt">
                <lst name="meta">
                  <str name="gap">+1YEAR</str>
                  <date name="end">2011-01-01T00:00:00Z</date>
                  <int name="before">0</int>
                  <int name="after">0</int>
          
                  <int name="between">11</int>
                </lst>
                <int name="2005-01-01T00:00:00Z">2</int>
                <int name="2006-01-01T00:00:00Z">9</int>
                <int name="2007-01-01T00:00:00Z">0</int>
                <int name="2008-01-01T00:00:00Z">0</int>
          
                <int name="2009-01-01T00:00:00Z">0</int>
                <int name="2010-01-01T00:00:00Z">0</int>
              </lst>
              <lst name="popularity">
                <lst name="meta">
                  <int name="gap">3</int>
                  <int name="end">13</int>
          
                  <int name="before">0</int>
                  <int name="after">0</int>
                  <int name="between">18</int>
                </lst>
                <int name="-2">3</int>
                <int name="1">2</int>
          
                <int name="4">13</int>
                <int name="7">7</int>
                <int name="10">2</int>
              </lst>
            </lst>
          </lst>
          </response>
          
          
          Show
          Hoss Man added a comment - Using the example data, here's an example of a monster query that uses facet.date and facet.range on the same field with the same start/end/gap as well as doing facet.range on the (float) price and (ing) popularity fields ... http://localhost:8983/solr/select/?wt=xml&indent=true&echoParams=none&q=*:*&rows=0&facet.range=price&f.price.facet.range.start=-2.3&f.price.facet.range.end=145.66&f.price.facet.range.gap=13.7432&facet.date=manufacturedate_dt&facet.date.start=NOW/YEAR-5YEARS&facet.date.end=NOW/YEAR%2B1DAY&facet.date.gap=%2B1YEAR&facet=true&facet.range.other=all&facet.range=manufacturedate_dt&f.manufacturedate_dt.facet.range.start=NOW/YEAR-5YEARS&f.manufacturedate_dt.facet.range.end=NOW/YEAR%2B1YEAR&f.manufacturedate_dt.facet.range.gap=%2B1YEAR&facet.range=popularity&facet.range.start=-2&facet.range.gap=3&facet.range.end=11 <?xml version= "1.0" encoding= "UTF-8" ?> <response> <lst name= "responseHeader" > <int name= "status" > 0 </int> <int name= "QTime" > 5 </int> </lst> <result name= "response" numFound= "19" start= "0" /> <lst name= "facet_counts" > <lst name= "facet_queries" /> <lst name= "facet_fields" /> <lst name= "facet_dates" > <lst name= "manufacturedate_dt" > <int name= "2005-01-01T00:00:00Z" > 2 </int> <int name= "2006-01-01T00:00:00Z" > 9 </int> <int name= "2007-01-01T00:00:00Z" > 0 </int> <int name= "2008-01-01T00:00:00Z" > 0 </int> <int name= "2009-01-01T00:00:00Z" > 0 </int> <int name= "2010-01-01T00:00:00Z" > 0 </int> <str name= "gap" > +1YEAR </str> <date name= "end" > 2011-01-01T00:00:00Z </date> </lst> </lst> <lst name= "facet_ranges" > <lst name= "price" > <lst name= "meta" > <float name= "gap" > 13.7432 </float> <float name= "end" > 148.87521 </float> <int name= "before" > 0 </int> <int name= "after" > 10 </int> <int name= "between" > 7 </int> </lst> <int name= "-2.3" > 2 </int> <int name= "11.4432" > 2 </int> <int name= "25.186401" > 0 </int> <int name= "38.929604" > 0 </int> <int name= "52.672806" > 0 </int> <int name= "66.41601" > 2 </int> <int name= "80.15921" > 1 </int> <int name= "93.90241" > 0 </int> <int name= "107.645615" > 0 </int> <int name= "121.38882" > 0 </int> <int name= "135.13202" > 0 </int> </lst> <lst name= "manufacturedate_dt" > <lst name= "meta" > <str name= "gap" > +1YEAR </str> <date name= "end" > 2011-01-01T00:00:00Z </date> <int name= "before" > 0 </int> <int name= "after" > 0 </int> <int name= "between" > 11 </int> </lst> <int name= "2005-01-01T00:00:00Z" > 2 </int> <int name= "2006-01-01T00:00:00Z" > 9 </int> <int name= "2007-01-01T00:00:00Z" > 0 </int> <int name= "2008-01-01T00:00:00Z" > 0 </int> <int name= "2009-01-01T00:00:00Z" > 0 </int> <int name= "2010-01-01T00:00:00Z" > 0 </int> </lst> <lst name= "popularity" > <lst name= "meta" > <int name= "gap" > 3 </int> <int name= "end" > 13 </int> <int name= "before" > 0 </int> <int name= "after" > 0 </int> <int name= "between" > 18 </int> </lst> <int name= "-2" > 3 </int> <int name= "1" > 2 </int> <int name= "4" > 13 </int> <int name= "7" > 7 </int> <int name= "10" > 2 </int> </lst> </lst> </lst> </response>
          Hide
          Yonik Seeley added a comment -

          Thanks for the example, makes it so much easier to casually review.

          Rather than embedding "meta" to the list containing the counts, perhaps we should bite the bullet and add an additional level for the counts. It would have been useful for other faceting types as well (and still would be in the future I think). It should be much easier (and more consistent) for clients to handle, rather than trying to exclude the thing called "meta" when building the list of counts returned.

              <lst name="popularity">
                  <int name="gap">3</int>
                  <int name="end">13</int>
                  <int name="before">0</int>
                  <int name="after">0</int>
                  <int name="between">18</int>
                  <lst name="counts">
                      <int name="-2">3</int>
                      <int name="1">2</int>
                      <int name="4">13</int>
                      <int name="7">7</int>
                      <int name="10">2</int>
                  </lst>
              </lst>
          

          Also, I've never been a fan of adding the empty "facet_range" list when there are no facet.range commands... but I understand it's consistent with the other facet types.

          Show
          Yonik Seeley added a comment - Thanks for the example, makes it so much easier to casually review. Rather than embedding "meta" to the list containing the counts, perhaps we should bite the bullet and add an additional level for the counts. It would have been useful for other faceting types as well (and still would be in the future I think). It should be much easier (and more consistent) for clients to handle, rather than trying to exclude the thing called "meta" when building the list of counts returned. <lst name= "popularity" > < int name= "gap" >3</ int > < int name= "end" >13</ int > < int name= "before" >0</ int > < int name= "after" >0</ int > < int name= "between" >18</ int > <lst name= "counts" > < int name= "-2" >3</ int > < int name= "1" >2</ int > < int name= "4" >13</ int > < int name= "7" >7</ int > < int name= "10" >2</ int > </lst> </lst> Also, I've never been a fan of adding the empty "facet_range" list when there are no facet.range commands... but I understand it's consistent with the other facet types.
          Hide
          Hoss Man added a comment -

          Rather than embedding "meta" to the list containing the counts, perhaps we should bite the bullet and add an additional level for the counts.

          yeah ... i'm on board with that idea. it's a trivial change.

          any comments on the implementation?

          i think it's fairly solid – the one wish i have though is to try and gut the existing date faceting code to just use the new code – but i can't see a very easy way to do that while dealing with the differnet param names .. suggestions?

          Show
          Hoss Man added a comment - Rather than embedding "meta" to the list containing the counts, perhaps we should bite the bullet and add an additional level for the counts. yeah ... i'm on board with that idea. it's a trivial change. any comments on the implementation? i think it's fairly solid – the one wish i have though is to try and gut the existing date faceting code to just use the new code – but i can't see a very easy way to do that while dealing with the differnet param names .. suggestions?
          Hide
          Gijs Kunze added a comment - - edited

          I like the extra level, it'll make my response parser a little bit less messy. But if I may be so bold as to make another suggestion (which could also apply to other forms of faceting). On the client side of things it is starting to become more difficult to actually apply the correct filter queries to filter on the facets. How about an option to request a more verbose output, something like this:

          <lst name="popularity">
                  <int name="start">-2</int>
                  <int name="gap">3</int>
                  <int name="end">13</int>
                  <int name="before">0</int>
                  <int name="after">0</int>
                  <int name="between">18</int>
                  <lst name="counts">
                      <lst name="-2">
                          <int name="start">-2</int>
                          <int name="end">1</int>
                          <str name="filter">popularity:([-2 TO 1])</int>
                          <int name="count">3</int>
                      <lst>
                      <lst name="1">
                          <int name="start">1</int>
                          <int name="end">4</int>
                          <str name="filter">popularity:([1 TO 4])</int>
                          <int name="count">2</int>
                      <lst>
                      <lst name="4">
                          <int name="start">4</int>
                          <int name="end">7</int>
                          <str name="filter">popularity:([4 TO 7])</int>
                          <int name="count">13</int>
                      <lst>
                      <lst name="7">
                          <int name="start">7</int>
                          <int name="end">10</int>
                          <str name="filter">popularity:([7 TO 10])</int>
                          <int name="count">7</int>
                      <lst>
                      <lst name="10">
                          <int name="start">10</int>
                          <int name="end">13</int>
                          <str name="filter">popularity:([10 TO 13])</int>
                          <int name="count">2</int>
                      <lst>
                  </lst>
          </lst>
          

          of course, if you use the include parameter to get more useful facets, the filter would be something like:

                          <str name="filter">popularity:([10 TO *] AND {* TO 13})</int>
          

          This would make client side code a lot easier I think.

          p.s. I noticed the start parameter was missing from your xml example, is that on purpose or a slight oversight?

          Show
          Gijs Kunze added a comment - - edited I like the extra level, it'll make my response parser a little bit less messy. But if I may be so bold as to make another suggestion (which could also apply to other forms of faceting). On the client side of things it is starting to become more difficult to actually apply the correct filter queries to filter on the facets. How about an option to request a more verbose output, something like this: <lst name= "popularity" > <int name= "start" > -2 </int> <int name= "gap" > 3 </int> <int name= "end" > 13 </int> <int name= "before" > 0 </int> <int name= "after" > 0 </int> <int name= "between" > 18 </int> <lst name= "counts" > <lst name= "-2" > <int name= "start" > -2 </int> <int name= "end" > 1 </int> <str name= "filter" > popularity:([-2 TO 1]) </int> <int name= "count" > 3 </int> <lst> <lst name= "1" > <int name= "start" > 1 </int> <int name= "end" > 4 </int> <str name= "filter" > popularity:([1 TO 4]) </int> <int name= "count" > 2 </int> <lst> <lst name= "4" > <int name= "start" > 4 </int> <int name= "end" > 7 </int> <str name= "filter" > popularity:([4 TO 7]) </int> <int name= "count" > 13 </int> <lst> <lst name= "7" > <int name= "start" > 7 </int> <int name= "end" > 10 </int> <str name= "filter" > popularity:([7 TO 10]) </int> <int name= "count" > 7 </int> <lst> <lst name= "10" > <int name= "start" > 10 </int> <int name= "end" > 13 </int> <str name= "filter" > popularity:([10 TO 13]) </int> <int name= "count" > 2 </int> <lst> </lst> </lst> of course, if you use the include parameter to get more useful facets, the filter would be something like: <str name= "filter" > popularity:([10 TO *] AND {* TO 13}) </int> This would make client side code a lot easier I think. p.s. I noticed the start parameter was missing from your xml example, is that on purpose or a slight oversight?
          Hide
          Hoss Man added a comment -

          How about an option to request a more verbose output, something like this:

          I'm not opposed to having a more verbose output option – but it should definitely be optional. I'd like to suggest that we tackle that in an independent Jira issue. The main question I have for you know is: do you see any obvious changes we should make to the params/format introduced in this issue in anticipation of making it simple to add a more verbose option/format in a subsequent issue?

          (Personally: i'm not really interested in working on the verbose format. I think putting the same effort into something like SOLR-1896 would be more useful: it would keep the response format small, and make it trivial for people to "filter" on individual ranges, even with complex "include" params)

          p.s. I noticed the start parameter was missing from your xml example, is that on purpose or a slight oversight?

          no ... we've never included "start" in the date faceting (so i didn't include it here) because it's always the value of the first constraint count ... the only reason "end" was ever included is because it isn't immediately obvious what the upper bound of the last range is from the gap: the "hardend" param can modify it.

          do you think we really need to explicitly label the start? if we add an optional verbose format that can obviously include it, is it really needed in this simple output?

          Show
          Hoss Man added a comment - How about an option to request a more verbose output, something like this: I'm not opposed to having a more verbose output option – but it should definitely be optional. I'd like to suggest that we tackle that in an independent Jira issue. The main question I have for you know is: do you see any obvious changes we should make to the params/format introduced in this issue in anticipation of making it simple to add a more verbose option/format in a subsequent issue? (Personally: i'm not really interested in working on the verbose format. I think putting the same effort into something like SOLR-1896 would be more useful: it would keep the response format small, and make it trivial for people to "filter" on individual ranges, even with complex "include" params) p.s. I noticed the start parameter was missing from your xml example, is that on purpose or a slight oversight? no ... we've never included "start" in the date faceting (so i didn't include it here) because it's always the value of the first constraint count ... the only reason "end" was ever included is because it isn't immediately obvious what the upper bound of the last range is from the gap: the "hardend" param can modify it. do you think we really need to explicitly label the start? if we add an optional verbose format that can obviously include it, is it really needed in this simple output?
          Hide
          Hoss Man added a comment -

          Updated patch that removes the "meta" NamedList and instead pushes the range counts down into a new "counts" NamedList. tests have been updated as well.

          Same example as before...

          http://localhost:8983/solr/select/?wt=xml&indent=true&echoParams=none&q=*:*&rows=0&facet.range=price&f.price.facet.range.start=-2.3&f.price.facet.range.end=145.66&f.price.facet.range.gap=13.7432&facet.date=manufacturedate_dt&facet.date.start=NOW/YEAR-5YEARS&facet.date.end=NOW/YEAR%2B1DAY&facet.date.gap=%2B1YEAR&facet=true&facet.range.other=all&facet.range=manufacturedate_dt&f.manufacturedate_dt.facet.range.start=NOW/YEAR-5YEARS&f.manufacturedate_dt.facet.range.end=NOW/YEAR%2B1YEAR&f.manufacturedate_dt.facet.range.gap=%2B1YEAR&facet.range=popularity&facet.range.start=-2&facet.range.gap=3&facet.range.end=11

          new output...

          
          <lst name="facet_counts">
            <lst name="facet_queries"/>
            <lst name="facet_fields"/>
            <lst name="facet_dates">
              <lst name="manufacturedate_dt">
                <int name="2005-01-01T00:00:00Z">2</int>
                <int name="2006-01-01T00:00:00Z">9</int>
                <int name="2007-01-01T00:00:00Z">0</int>
                <int name="2008-01-01T00:00:00Z">0</int>
                <int name="2009-01-01T00:00:00Z">0</int>
                <int name="2010-01-01T00:00:00Z">0</int>
                <str name="gap">+1YEAR</str>
                <date name="end">2011-01-01T00:00:00Z</date>
              </lst>
            </lst>
            <lst name="facet_ranges">
              <lst name="price">
                <lst name="counts">
                  <int name="-2.3">2</int>
                  <int name="11.4432">2</int>
                  <int name="25.186401">0</int>
                  <int name="38.929604">0</int>
                  <int name="52.672806">0</int>
                  <int name="66.41601">2</int>
                  <int name="80.15921">1</int>
                  <int name="93.90241">0</int>
                  <int name="107.645615">0</int>
                  <int name="121.38882">0</int>
                  <int name="135.13202">0</int>
                </lst>
                <float name="gap">13.7432</float>
                <float name="end">148.87521</float>
                <int name="before">0</int>
                <int name="after">10</int>
                <int name="between">7</int>
              </lst>
              <lst name="manufacturedate_dt">
                <lst name="counts">
                  <int name="2005-01-01T00:00:00Z">2</int>
                  <int name="2006-01-01T00:00:00Z">9</int>
                  <int name="2007-01-01T00:00:00Z">0</int>
                  <int name="2008-01-01T00:00:00Z">0</int>
                  <int name="2009-01-01T00:00:00Z">0</int>
                  <int name="2010-01-01T00:00:00Z">0</int>
                </lst>
                <str name="gap">+1YEAR</str>
                <date name="end">2011-01-01T00:00:00Z</date>
                <int name="before">0</int>
                <int name="after">0</int>
                <int name="between">11</int>
              </lst>
              <lst name="popularity">
                <lst name="counts">
                  <int name="-2">3</int>
                  <int name="1">2</int>
                  <int name="4">13</int>
                  <int name="7">7</int>
                  <int name="10">2</int>
                </lst>
                <int name="gap">3</int>
                <int name="end">13</int>
                <int name="before">0</int>
                <int name="after">0</int>
                <int name="between">18</int>
              </lst>
            </lst>
          </lst>
          
          Show
          Hoss Man added a comment - Updated patch that removes the "meta" NamedList and instead pushes the range counts down into a new "counts" NamedList. tests have been updated as well. Same example as before... http://localhost:8983/solr/select/?wt=xml&indent=true&echoParams=none&q=*:*&rows=0&facet.range=price&f.price.facet.range.start=-2.3&f.price.facet.range.end=145.66&f.price.facet.range.gap=13.7432&facet.date=manufacturedate_dt&facet.date.start=NOW/YEAR-5YEARS&facet.date.end=NOW/YEAR%2B1DAY&facet.date.gap=%2B1YEAR&facet=true&facet.range.other=all&facet.range=manufacturedate_dt&f.manufacturedate_dt.facet.range.start=NOW/YEAR-5YEARS&f.manufacturedate_dt.facet.range.end=NOW/YEAR%2B1YEAR&f.manufacturedate_dt.facet.range.gap=%2B1YEAR&facet.range=popularity&facet.range.start=-2&facet.range.gap=3&facet.range.end=11 new output... <lst name= "facet_counts" > <lst name= "facet_queries" /> <lst name= "facet_fields" /> <lst name= "facet_dates" > <lst name= "manufacturedate_dt" > <int name= "2005-01-01T00:00:00Z" > 2 </int> <int name= "2006-01-01T00:00:00Z" > 9 </int> <int name= "2007-01-01T00:00:00Z" > 0 </int> <int name= "2008-01-01T00:00:00Z" > 0 </int> <int name= "2009-01-01T00:00:00Z" > 0 </int> <int name= "2010-01-01T00:00:00Z" > 0 </int> <str name= "gap" > +1YEAR </str> <date name= "end" > 2011-01-01T00:00:00Z </date> </lst> </lst> <lst name= "facet_ranges" > <lst name= "price" > <lst name= "counts" > <int name= "-2.3" > 2 </int> <int name= "11.4432" > 2 </int> <int name= "25.186401" > 0 </int> <int name= "38.929604" > 0 </int> <int name= "52.672806" > 0 </int> <int name= "66.41601" > 2 </int> <int name= "80.15921" > 1 </int> <int name= "93.90241" > 0 </int> <int name= "107.645615" > 0 </int> <int name= "121.38882" > 0 </int> <int name= "135.13202" > 0 </int> </lst> <float name= "gap" > 13.7432 </float> <float name= "end" > 148.87521 </float> <int name= "before" > 0 </int> <int name= "after" > 10 </int> <int name= "between" > 7 </int> </lst> <lst name= "manufacturedate_dt" > <lst name= "counts" > <int name= "2005-01-01T00:00:00Z" > 2 </int> <int name= "2006-01-01T00:00:00Z" > 9 </int> <int name= "2007-01-01T00:00:00Z" > 0 </int> <int name= "2008-01-01T00:00:00Z" > 0 </int> <int name= "2009-01-01T00:00:00Z" > 0 </int> <int name= "2010-01-01T00:00:00Z" > 0 </int> </lst> <str name= "gap" > +1YEAR </str> <date name= "end" > 2011-01-01T00:00:00Z </date> <int name= "before" > 0 </int> <int name= "after" > 0 </int> <int name= "between" > 11 </int> </lst> <lst name= "popularity" > <lst name= "counts" > <int name= "-2" > 3 </int> <int name= "1" > 2 </int> <int name= "4" > 13 </int> <int name= "7" > 7 </int> <int name= "10" > 2 </int> </lst> <int name= "gap" > 3 </int> <int name= "end" > 13 </int> <int name= "before" > 0 </int> <int name= "after" > 0 </int> <int name= "between" > 18 </int> </lst> </lst> </lst>
          Hide
          Gijs Kunze added a comment -

          p.s. I noticed the start parameter was missing from your xml example, is that on purpose or a slight oversight?

          no ... we've never included "start" in the date faceting (so i didn't include it here) because it's always the value of the first constraint count ... the only reason "end" was ever included is because it isn't immediately obvious what the upper bound of the last range is from the gap: the "hardend" param can modify it.

          do you think we really need to explicitly label the start? if we add an optional verbose format that can obviously include it, is it really needed in this simple output?

          Ahh ok, that makes sense. However with the mincount parameter you do not always have the start parameter, nor do you always have both values of a single count. This not be much of an issue with dates and integers as you can simply add the gap to the start value but with floating point numbers I'm a little more wary adding the gap to floating point values due to rounding errors.

          The verbose response option was just a thought. While some parts might be useful thinking more about it the filter response isn't as useful as I thought it would be while writing the post. It's just that I'd like it if there were more of a link between the facets one selects and the filters based on those facets. But I guess that's another discussion.

          Show
          Gijs Kunze added a comment - p.s. I noticed the start parameter was missing from your xml example, is that on purpose or a slight oversight? no ... we've never included "start" in the date faceting (so i didn't include it here) because it's always the value of the first constraint count ... the only reason "end" was ever included is because it isn't immediately obvious what the upper bound of the last range is from the gap: the "hardend" param can modify it. do you think we really need to explicitly label the start? if we add an optional verbose format that can obviously include it, is it really needed in this simple output? Ahh ok, that makes sense. However with the mincount parameter you do not always have the start parameter, nor do you always have both values of a single count. This not be much of an issue with dates and integers as you can simply add the gap to the start value but with floating point numbers I'm a little more wary adding the gap to floating point values due to rounding errors. The verbose response option was just a thought. While some parts might be useful thinking more about it the filter response isn't as useful as I thought it would be while writing the post. It's just that I'd like it if there were more of a link between the facets one selects and the filters based on those facets. But I guess that's another discussion.
          Hide
          Hoss Man added a comment -

          Ahh ok, that makes sense. However with the mincount parameter you do not always have the start parameter, nor do you always have both values of a single count. This not be much of an issue with dates and integers as you can simply add the gap to the start value but with floating point numbers I'm a little more wary adding the gap to floating point values due to rounding errors.

          Whoa .. you just blew my mind.

          Seriously, i'm not sure how that was so horribly overlooked when mincount support was added to date faceting – thank you for bringing this up.

          The possibility of rounding errors for an arbitrary range doesn't concern me too much, but i can see how in some cases it might be a factor if the client code uses differnet floating point precision then Java does – like i said, i think something like SOLR-1896 is hte best way to solve all of that stuff. What does concern me is your point about how the "first" range might not be there at all because of the mincount – that makes it impossible to do anything useful with the "before" and "between" counts (regardless of any floating point rounding errors) unless you have echoParams or some other way to "know" what the start value was – but by that logic you don't need to know what the "gap" is either – people shouldn't be required to know what exactly all the query params were to make sense of the data.

          we should definitely add the "start" value.

          It's just that I'd like it if there were more of a link between the facets one selects and the filters based on those facets. But I guess that's another discussion.

          Agreed ... i'd definitely like to see SOLR-1896 make it trivial to just refer to the "low" value of any range that comes back from range faceting in an "fq and have solr automaticly build a range filter with the appropriate upper bound and inclusion properties.

          Show
          Hoss Man added a comment - Ahh ok, that makes sense. However with the mincount parameter you do not always have the start parameter, nor do you always have both values of a single count. This not be much of an issue with dates and integers as you can simply add the gap to the start value but with floating point numbers I'm a little more wary adding the gap to floating point values due to rounding errors. Whoa .. you just blew my mind. Seriously, i'm not sure how that was so horribly overlooked when mincount support was added to date faceting – thank you for bringing this up. The possibility of rounding errors for an arbitrary range doesn't concern me too much, but i can see how in some cases it might be a factor if the client code uses differnet floating point precision then Java does – like i said, i think something like SOLR-1896 is hte best way to solve all of that stuff. What does concern me is your point about how the "first" range might not be there at all because of the mincount – that makes it impossible to do anything useful with the "before" and "between" counts (regardless of any floating point rounding errors) unless you have echoParams or some other way to "know" what the start value was – but by that logic you don't need to know what the "gap" is either – people shouldn't be required to know what exactly all the query params were to make sense of the data. we should definitely add the "start" value. It's just that I'd like it if there were more of a link between the facets one selects and the filters based on those facets. But I guess that's another discussion. Agreed ... i'd definitely like to see SOLR-1896 make it trivial to just refer to the "low" value of any range that comes back from range faceting in an "fq and have solr automaticly build a range filter with the appropriate upper bound and inclusion properties.
          Hide
          Hoss Man added a comment -

          updated patch to include the "start" values ... did this for range faceting and the legacy date faceting (where it shouldn't cause any problems)

          I think this is good to go, for both 3.1 and 4.0

          objections?

          Show
          Hoss Man added a comment - updated patch to include the "start" values ... did this for range faceting and the legacy date faceting (where it shouldn't cause any problems) I think this is good to go, for both 3.1 and 4.0 objections?
          Hide
          Hoss Man added a comment -

          Committed revision 980555. - trunk

          Show
          Hoss Man added a comment - Committed revision 980555. - trunk
          Hide
          Hoss Man added a comment -

          FYI: still planning on back merge this to 3x, but there were some weird merge conflicts that i need to resolve after lunch

          Show
          Hoss Man added a comment - FYI: still planning on back merge this to 3x, but there were some weird merge conflicts that i need to resolve after lunch
          Hide
          Hoss Man added a comment -

          Ah ... ok, the problem is that i never merged SOLR-397 to 3x ... doing that is pretty much a pre-req for merging this to the 3x branch.

          Show
          Hoss Man added a comment - Ah ... ok, the problem is that i never merged SOLR-397 to 3x ... doing that is pretty much a pre-req for merging this to the 3x branch.
          Hide
          Hoss Man added a comment -

          Committed revision 980592 - trunk typo fix

          Show
          Hoss Man added a comment - Committed revision 980592 - trunk typo fix
          Hide
          Hoss Man added a comment -

          Committed revision 980610. - 3x branch merge

          this merge involved a lot of extra merging for previous comments SimpleFacetsTest because there were just so many refactoring style changes that it just made more sense to merge them all in (they didn't depend on any new functionality not already in 3x)

          Big thanks to Gijs for kicking off this issue, pointing out how easy it was to generalize the date faceting logic to work for numbers, and for catching the issue with facet mincount.

          Show
          Hoss Man added a comment - Committed revision 980610. - 3x branch merge this merge involved a lot of extra merging for previous comments SimpleFacetsTest because there were just so many refactoring style changes that it just made more sense to merge them all in (they didn't depend on any new functionality not already in 3x) Big thanks to Gijs for kicking off this issue, pointing out how easy it was to generalize the date faceting logic to work for numbers, and for catching the issue with facet mincount.
          Hide
          Grant Ingersoll added a comment -

          Hoss or Gijs,

          Any chance one of you can add the appropriate docs to http://wiki.apache.org/solr/SimpleFacetParameters?

          Show
          Grant Ingersoll added a comment - Hoss or Gijs, Any chance one of you can add the appropriate docs to http://wiki.apache.org/solr/SimpleFacetParameters?
          Hide
          Grant Ingersoll added a comment -

          FYI, I added docs to the link above. If one of you can review, that would be great.

          Show
          Grant Ingersoll added a comment - FYI, I added docs to the link above. If one of you can review, that would be great.
          Hide
          Yonik Seeley added a comment -

          Something I just noticed when using JSON: it looks like a SimpleOrderedMap is used instead of a NamedList for the facet counts (indicating that having things as a map is more important than preserving order).

          Normal field faceting uses a NamedList for the counts to favor preserving order. Should this be consistent?

          Show
          Yonik Seeley added a comment - Something I just noticed when using JSON: it looks like a SimpleOrderedMap is used instead of a NamedList for the facet counts (indicating that having things as a map is more important than preserving order). Normal field faceting uses a NamedList for the counts to favor preserving order. Should this be consistent?
          Hide
          Hoss Man added a comment -

          Normal field faceting uses a NamedList for the counts to favor preserving order. Should this be consistent?

          Hmmm.... I wrote it to be consistent with date faceting which has always used SimpleOrderedMap ... not sure why.

          I think you are right though, NamedList makes more sense (and range faceting's output is already not output compatible with date faceting, so it's not a big deal)

          Show
          Hoss Man added a comment - Normal field faceting uses a NamedList for the counts to favor preserving order. Should this be consistent? Hmmm.... I wrote it to be consistent with date faceting which has always used SimpleOrderedMap ... not sure why. I think you are right though, NamedList makes more sense (and range faceting's output is already not output compatible with date faceting, so it's not a big deal)
          Hide
          Hoss Man added a comment -

          one line patch to make the "counts" section for each field use a NamedList instead of aSimpleOrderedMap. This means that the "fields" and "metadata" (gap, start, between, after, etc...) structures are still maps with simple key lookups, but when you start looking at the individual rnages and counts they are controlled by json.nl.

          Same example as before, with "wt=json&json.nl=arrarr"

          {
            "responseHeader":{
              "status":0,
              "QTime":6},
            "response":{"numFound":20,"start":0,"docs":[]
            },
            "facet_counts":{
              "facet_queries":{},
              "facet_fields":{},
              "facet_dates":{
                "manufacturedate_dt":{
                  "2005-01-01T00:00:00Z":2,
                  "2006-01-01T00:00:00Z":9,
                  "2007-01-01T00:00:00Z":0,
                  "2008-01-01T00:00:00Z":0,
                  "2009-01-01T00:00:00Z":0,
                  "2010-01-01T00:00:00Z":0,
                  "gap":"+1YEAR",
                  "start":"2005-01-01T00:00:00Z",
                  "end":"2011-01-01T00:00:00Z"}},
              "facet_ranges":{
                "price":{
                  "counts":
                  [
                    ["-2.3",2],
                    ["11.4432",2],
                    ["25.186401",0],
                    ["38.929604",0],
                    ["52.672806",0],
                    ["66.41601",2],
                    ["80.15921",1],
                    ["93.90241",0],
                    ["107.645615",0],
                    ["121.38882",0],
                    ["135.13202",0]],
                  "gap":13.7432,
                  "start":-2.3,
                  "end":148.87521,
                  "before":0,
                  "after":10,
                  "between":7},
                "manufacturedate_dt":{
                  "counts":
                  [
                    ["2005-01-01T00:00:00Z",2],
                    ["2006-01-01T00:00:00Z",9],
                    ["2007-01-01T00:00:00Z",0],
                    ["2008-01-01T00:00:00Z",0],
                    ["2009-01-01T00:00:00Z",0],
                    ["2010-01-01T00:00:00Z",0]],
                  "gap":"+1YEAR",
                  "start":"2005-01-01T00:00:00Z",
                  "end":"2011-01-01T00:00:00Z",
                  "before":0,
                  "after":0,
                  "between":11},
                "popularity":{
                  "counts":
                  [
                    ["-2",3],
                    ["1",2],
                    ["4",13],
                    ["7",7],
                    ["10",2]],
                  "gap":3,
                  "start":-2,
                  "end":13,
                  "before":0,
                  "after":0,
                  "between":18}}}}
          
          Show
          Hoss Man added a comment - one line patch to make the "counts" section for each field use a NamedList instead of aSimpleOrderedMap. This means that the "fields" and "metadata" (gap, start, between, after, etc...) structures are still maps with simple key lookups, but when you start looking at the individual rnages and counts they are controlled by json.nl. Same example as before, with "wt=json&json.nl=arrarr" { "responseHeader" :{ "status" :0, "QTime" :6}, "response" :{ "numFound" :20, "start" :0, "docs" :[] }, "facet_counts" :{ "facet_queries" :{}, "facet_fields" :{}, "facet_dates" :{ "manufacturedate_dt" :{ "2005-01-01T00:00:00Z" :2, "2006-01-01T00:00:00Z" :9, "2007-01-01T00:00:00Z" :0, "2008-01-01T00:00:00Z" :0, "2009-01-01T00:00:00Z" :0, "2010-01-01T00:00:00Z" :0, "gap" : "+1YEAR" , "start" : "2005-01-01T00:00:00Z" , "end" : "2011-01-01T00:00:00Z" }}, "facet_ranges" :{ "price" :{ "counts" : [ [ "-2.3" ,2], [ "11.4432" ,2], [ "25.186401" ,0], [ "38.929604" ,0], [ "52.672806" ,0], [ "66.41601" ,2], [ "80.15921" ,1], [ "93.90241" ,0], [ "107.645615" ,0], [ "121.38882" ,0], [ "135.13202" ,0]], "gap" :13.7432, "start" :-2.3, "end" :148.87521, "before" :0, "after" :10, "between" :7}, "manufacturedate_dt" :{ "counts" : [ [ "2005-01-01T00:00:00Z" ,2], [ "2006-01-01T00:00:00Z" ,9], [ "2007-01-01T00:00:00Z" ,0], [ "2008-01-01T00:00:00Z" ,0], [ "2009-01-01T00:00:00Z" ,0], [ "2010-01-01T00:00:00Z" ,0]], "gap" : "+1YEAR" , "start" : "2005-01-01T00:00:00Z" , "end" : "2011-01-01T00:00:00Z" , "before" :0, "after" :0, "between" :11}, "popularity" :{ "counts" : [ [ "-2" ,3], [ "1" ,2], [ "4" ,13], [ "7" ,7], [ "10" ,2]], "gap" :3, "start" :-2, "end" :13, "before" :0, "after" :0, "between" :18}}}}
          Hide
          Hoss Man added a comment -

          reopening to give visibility to the NamedList vs SimpleOrderedMap consideration.

          I'm with Yonik leaning towards switching to NamedList for "counts" - anybody else have an opinion?

          Show
          Hoss Man added a comment - reopening to give visibility to the NamedList vs SimpleOrderedMap consideration. I'm with Yonik leaning towards switching to NamedList for "counts" - anybody else have an opinion?
          Hide
          Yonik Seeley added a comment -

          Looks like nobody objects. Since this is a little API change to 3.1, we should get it in soon before it's more frozen.

          Show
          Yonik Seeley added a comment - Looks like nobody objects. Since this is a little API change to 3.1, we should get it in soon before it's more frozen.
          Hide
          Hoss Man added a comment -

          Thanks for the poke yonik, i totally forgot i re-opened this.

          committed the one line change to both 3x and trunk

          Show
          Hoss Man added a comment - Thanks for the poke yonik, i totally forgot i re-opened this. committed the one line change to both 3x and trunk
          Hide
          David Smiley added a comment -

          Two comments:
          1. I think we should let it be known that "facet.date" is deprecated in 3.1. That way it can be removed in a future release without waiting yet another release.
          2. I think it's very odd that the default for the include parameter is for both edges to be inclusive. This means double-counting! Yes, that's how it used to work, but I argue it never should have worked that way and I don't think anyone is actually depending on this behavior. So backwards-compatibility is moot. I propose "lower" be the default.

          Show
          David Smiley added a comment - Two comments: 1. I think we should let it be known that "facet.date" is deprecated in 3.1. That way it can be removed in a future release without waiting yet another release. 2. I think it's very odd that the default for the include parameter is for both edges to be inclusive. This means double-counting! Yes, that's how it used to work, but I argue it never should have worked that way and I don't think anyone is actually depending on this behavior. So backwards-compatibility is moot. I propose "lower" be the default.
          Hide
          Hoss Man added a comment -

          I think we should let it be known that "facet.date" is deprecated in 3.1. That way it can be removed in a future release without waiting yet another release.

          good catch ... i thought i had done that....

          Committed revision 1071842. - trunk
          Committed revision 1071843. - 3x

          ...i'll also add a note to the wiki

          I think it's very odd that the default for the include parameter is for both edges to be inclusive. This means double-counting! Yes, that's how it used to work, but I argue it never should have worked that way and I don't think anyone is actually depending on this behavior. So backwards-compatibility is moot. I propose "lower" be the default.

          Hmmm... i think i disagree with you there ... in no particular order...

          • including both edges is the most common convention used for range queries (ie:
            foo:[5 TO 10]

            so i think it makes sense for the gap ranges generated by range faceting to be consistent by default.

          • I don't really feel like there is any compelling reason why "lower" would make a better default then "upper" or "lower,edge" or "upper,edge", etc... in all of those cases there wouldn't be double counting, and equally valid arguments could be made for all of them – but ultimately it would come down to specific use cases.
          • the current default – while certainly not ideal, is at least consistent with some other things for people who are already familiar with them (ie: legacy date faceting, and my first point about common use of range queries above)
          • it's soooooooo easy for people to set their own default facet.range.include to whatever they want, i really don't see much harm in the current one.

          Ultimately: double counting may be something silly and legacy, but it's easy to change and none of the other options would make indisputably better defaults, so why not at least be consistent with our silly legacy?

          (anybody else have an opinion? I've got no qualms about changing my mind if i'm in the minority here)

          Show
          Hoss Man added a comment - I think we should let it be known that "facet.date" is deprecated in 3.1. That way it can be removed in a future release without waiting yet another release. good catch ... i thought i had done that.... Committed revision 1071842. - trunk Committed revision 1071843. - 3x ...i'll also add a note to the wiki I think it's very odd that the default for the include parameter is for both edges to be inclusive. This means double-counting! Yes, that's how it used to work, but I argue it never should have worked that way and I don't think anyone is actually depending on this behavior. So backwards-compatibility is moot. I propose "lower" be the default. Hmmm... i think i disagree with you there ... in no particular order... including both edges is the most common convention used for range queries (ie: foo:[5 TO 10] so i think it makes sense for the gap ranges generated by range faceting to be consistent by default. I don't really feel like there is any compelling reason why "lower" would make a better default then "upper" or "lower,edge" or "upper,edge", etc... in all of those cases there wouldn't be double counting, and equally valid arguments could be made for all of them – but ultimately it would come down to specific use cases. the current default – while certainly not ideal, is at least consistent with some other things for people who are already familiar with them (ie: legacy date faceting, and my first point about common use of range queries above) it's soooooooo easy for people to set their own default facet.range.include to whatever they want, i really don't see much harm in the current one. Ultimately: double counting may be something silly and legacy, but it's easy to change and none of the other options would make indisputably better defaults, so why not at least be consistent with our silly legacy? (anybody else have an opinion? I've got no qualms about changing my mind if i'm in the minority here)
          Hide
          Yonik Seeley added a comment -

          including both edges is the most common convention used for range queries

          Esp since syntax like [10 TO 20} wasn't supported until 3.1

          That said, it does feel like the most common expectation is to not double-count.

          Shopping sites may list ranges like 10-25,25-50,50-100, even when the ranges are exclusive, but that's just because it looks ugly to list 10-24.99, etc.

          Show
          Yonik Seeley added a comment - including both edges is the most common convention used for range queries Esp since syntax like [10 TO 20} wasn't supported until 3.1 That said, it does feel like the most common expectation is to not double-count. Shopping sites may list ranges like 10-25,25-50,50-100, even when the ranges are exclusive, but that's just because it looks ugly to list 10-24.99, etc.
          Hide
          David Smiley added a comment -

          Of course I totally agree with Yonik. I don't care that much what the default include is (upper, lower, ...) as long as it doesn't double-count. Double-counting is bad – it can lead to a bad user experience. There's something about "lower" that I feel makes it slightly better than "upper" but I can't really explain the rationale. I don't think there's any point in compliance with legacy if nobody depended on the behavior (they couldn't specify the behavior before either). Just because its easy to set the include, doesn't mean the default is arbitrary. Any way, if the default remains to double-count, I'm going to insist that my readers for the second edition of my book change this value.

          Show
          David Smiley added a comment - Of course I totally agree with Yonik. I don't care that much what the default include is (upper, lower, ...) as long as it doesn't double-count. Double-counting is bad – it can lead to a bad user experience. There's something about "lower" that I feel makes it slightly better than "upper" but I can't really explain the rationale. I don't think there's any point in compliance with legacy if nobody depended on the behavior (they couldn't specify the behavior before either). Just because its easy to set the include, doesn't mean the default is arbitrary. Any way, if the default remains to double-count, I'm going to insist that my readers for the second edition of my book change this value.
          Hide
          Hoss Man added a comment -

          It wasn't clear to me from yonik's comment what he thought was the lesser of two evils – but i clarified with him on IRC that he agreed with David.

          that makes it 2 to 1 in favor of changing it, so i'm on board.

          Show
          Hoss Man added a comment - It wasn't clear to me from yonik's comment what he thought was the lesser of two evils – but i clarified with him on IRC that he agreed with David. that makes it 2 to 1 in favor of changing it, so i'm on board.
          Hide
          Hoss Man added a comment -

          patch that switches facet.range to default to "lower" with necessary changes to tests

          Show
          Hoss Man added a comment - patch that switches facet.range to default to "lower" with necessary changes to tests
          Hide
          Hoss Man added a comment -

          Committed SOLR-1240.include-default-lower.patch ...

          Committed revision 1075603. - trunk
          Committed revision 1075613. - 3x

          David: thanks for the prodding about facet.range.include

          Show
          Hoss Man added a comment - Committed SOLR-1240 .include-default-lower.patch ... Committed revision 1075603. - trunk Committed revision 1075613. - 3x David: thanks for the prodding about facet.range.include
          Hide
          Grant Ingersoll added a comment -

          Bulk close for 3.1.0 release

          Show
          Grant Ingersoll added a comment - Bulk close for 3.1.0 release

            People

            • Assignee:
              Hoss Man
              Reporter:
              Gijs Kunze
            • Votes:
              1 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development