Details

    • Type: Improvement Improvement
    • Status: Open
    • Priority: Minor Minor
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: 4.9, 5.0
    • Component/s: None
    • Labels:
      None

      Description

      There really is no reason why the range gap for date and numeric faceting needs to be evenly spaced. For instance, if and when SOLR-1581 is completed and one were doing spatial distance calculations, one could facet by function into 3 different sized buckets: walking distance (0-5KM), driving distance (5KM-150KM) and everything else (150KM+), for instance. We should be able to quantize the results into arbitrarily sized buckets.

      (Original syntax proposal removed, see discussion for concrete syntax)

      1. SOLR-2366.patch
        8 kB
        Ted Sullivan
      2. SOLR-2366.patch
        8 kB
        Grant Ingersoll
      3. SOLR-2366.patch
        6 kB
        Grant Ingersoll

        Issue Links

          Activity

          Hide
          Grant Ingersoll added a comment -

          Note, also, the proposed syntax is just a draft, I'm definitely open to other syntax

          Show
          Grant Ingersoll added a comment - Note, also, the proposed syntax is just a draft, I'm definitely open to other syntax
          Hide
          Grant Ingersoll added a comment -

          Adds variable width gap capabilities and some tests. Still needs some more tests for edge conditions, etc. but it is something that others can look at and comment on.

          Show
          Grant Ingersoll added a comment - Adds variable width gap capabilities and some tests. Still needs some more tests for edge conditions, etc. but it is something that others can look at and comment on.
          Hide
          Grant Ingersoll added a comment -

          Added more tests, cleaned up the patch, all tests pass. I think it is ready to commit and will do so in a day or two or maybe this weekend.

          Show
          Grant Ingersoll added a comment - Added more tests, cleaned up the patch, all tests pass. I think it is ready to commit and will do so in a day or two or maybe this weekend.
          Hide
          Hoss Man added a comment -

          the use case of facet.range (and facet.date before it) was always about having ranges generated for you automatcly using a fixed gap size. if you want variable gap sizes, it's just as easy to specify them using facet.query.

          i don't really understand how your proposal adds value over using facet.query for the ranges you want to have specific widths, and then using facet.range for the rest of the ranges you want generated automaticly with a specific gap.

          it just seems like a more confusing way of expressing the same thing

          Show
          Hoss Man added a comment - the use case of facet.range (and facet.date before it) was always about having ranges generated for you automatcly using a fixed gap size. if you want variable gap sizes, it's just as easy to specify them using facet.query. i don't really understand how your proposal adds value over using facet.query for the ranges you want to have specific widths, and then using facet.range for the rest of the ranges you want generated automaticly with a specific gap. it just seems like a more confusing way of expressing the same thing
          Hide
          Grant Ingersoll added a comment -

          it just seems like a more confusing way of expressing the same thing

          I think it's a lot less confusing. You only have to express start, end and the size of the buckets you want. With facet.query, you have to write out each expression for every bucket and do the math on all the boundaries. I don't think it is just as easy to specify using facet.query. Not too mention that facet.query also involves a lot more parsing.

          Show
          Grant Ingersoll added a comment - it just seems like a more confusing way of expressing the same thing I think it's a lot less confusing. You only have to express start, end and the size of the buckets you want. With facet.query, you have to write out each expression for every bucket and do the math on all the boundaries. I don't think it is just as easy to specify using facet.query. Not too mention that facet.query also involves a lot more parsing.
          Hide
          Ryan McKinley added a comment -

          I agree with grant that this syntax is more clear then using facet.query for each bucket.

          Just throwing it out there... but is there a way to not specify the start/end, and have that based on the values in the index? start=* end=*? In this case, it would be nice to specify the gap as round numbers. Perhaps gap=%10? assuming you have the values: 22,28,35, that would give you gaps for 20-30 and 30-40

          Show
          Ryan McKinley added a comment - I agree with grant that this syntax is more clear then using facet.query for each bucket. Just throwing it out there... but is there a way to not specify the start/end, and have that based on the values in the index? start=* end=*? In this case, it would be nice to specify the gap as round numbers. Perhaps gap=%10? assuming you have the values: 22,28,35, that would give you gaps for 20-30 and 30-40
          Hide
          Grant Ingersoll added a comment -

          Just throwing it out there... but is there a way to not specify the start/end, and have that based on the values in the index? start=* end=*? In this case, it would be nice to specify the gap as round numbers. Perhaps gap=%10? assuming you have the values: 22,28,35, that would give you gaps for 20-30 and 30-40

          Ryan, I think that's also an excellent variation. Sometimes you want hard start/ends, sometimes you want percentage buckets, especially for the thing I'm working on now, which is facet by function

          Show
          Grant Ingersoll added a comment - Just throwing it out there... but is there a way to not specify the start/end, and have that based on the values in the index? start=* end=*? In this case, it would be nice to specify the gap as round numbers. Perhaps gap=%10? assuming you have the values: 22,28,35, that would give you gaps for 20-30 and 30-40 Ryan, I think that's also an excellent variation. Sometimes you want hard start/ends, sometimes you want percentage buckets, especially for the thing I'm working on now, which is facet by function
          Hide
          Ryan McKinley added a comment -

          sometimes you want percentage buckets

          This does not map easily since you need to know the min/max value before traversing – it may actually take two passes.

          The use case I am looking at is trying to show reasonable histograms for field values, without really knowing much about the field values as input. Currently I run the stats component, and then in a second query facet within that range – gets the job done, but not ideal.

          Show
          Ryan McKinley added a comment - sometimes you want percentage buckets This does not map easily since you need to know the min/max value before traversing – it may actually take two passes. The use case I am looking at is trying to show reasonable histograms for field values, without really knowing much about the field values as input. Currently I run the stats component, and then in a second query facet within that range – gets the job done, but not ideal.
          Hide
          Hoss Man added a comment -

          (FYI: i haven't looked at the patch, because i'm trying to focus on 3.1 bug fixes, but grant specifically called me out on this on irc, so i'm replying based purely on the comments)

          I think it's a lot less confusing. You only have to express start, end and the size of the buckets you want. With facet.query, you have to write out each expression for every bucket and do the math on all the boundaries.

          ok ... fair enough, i can't deny the syntax you are proposing would be easier then specifying individual facet.query params, i'm just not convinced it would be completely intuitive. If i told someone about this feature, and then showed them this request...

          facet.range.start=10&facet.range.end=100&facet.range.gap=10,20,50

          I would be hard pressed to explain why the resulting ranges were...

          10-20, 20-40, 40-90, and 90-190

          ...instead of...

          10-20, 10-30, 10-60, and 60-110

          (bearing in mind: facet.range.hardend defaults to "false")

          the existing start/end/gap params may not be 100% intuitive purely by name, but once you read about them once, they are fairly easy to grasp and not very confusing at all when you read examples later. likewise, a collection of facet.query objects is fairly intuitive and unambiguious. I just don't feel that way about what you are suggesting (then agian: i unleashed "mm" on the world, so i'm not really in a good position to throw stones)

          I'm also not convinced that it really makes sense in use cases like this (where you want variable sized buckets) to specify the gap sizes as a list, instead of the specifying the boundaries on each bucket.

          What you are describing almost feels like it should be a new category of faceting – or a variation on range faceting that doesn't involve the start/end/gap params at all (but could still respects facet.range.include and facet.range.other)

          Here's my counter-proposal/suggestion...

          I'm imagining a facet.range.buckets param that (if present) would override facet.range.gap, facet.range.start, and facet.range.end (so using facet.range would require either bucket or start/end/gap). facet.range.buckets would take a comma separated list of value representing the specific values you wanted to see used to define adjoining range boundary points, with some syntax ("..." seems natural) indicating "repeat last range size until reach this next value"

          so you could say...

          facet.range=price&facet.range.buckets=0,10,25,50,100,...,300

          ...and the resulting ranges computed would be...

          0-10, 10-25, 25-50, 50-100, 100-150, 150-200, 200-250, 250-300

          ...likewise you could say...

          facet.range=age&facet.range.buckets=0,1,...,18,25,40,60,...,100

          ...and you would get ranges for each year from 0 to 18, followed by 18-25, 25-40, 40-60, 60-80, 80-100.

          The tricky situations would be things like...

          1. facet.range.buckets=0,2,3,...,10
          2. facet.range.buckets=0,7,...,10,20

          ...the first could be dealt with using facet.range.hardend like we do today (so the resulting buckets were "0-2,2-5,5-8,8-11") but i don't think it should. I think it should result in "0-2,2-5,5-8,8-10" ... it's hard to imaging letting a param like facet.range.hardend override the explicit "10" in the buckets list when we don't have programaticly generate buckets of precisesly the same size, particularly when you consider the implications that would carry over to the second case (i really can't imagine letting that produce any ranges other then "0-7,7-10,10-20")

          So yeah ... that's what i think would make more sense then letting you specify a comma seperated list in the "gaps" param ... fundamentally i think it comes down to the point i alluded to earlier in this comment: is specifying a sequence of varying gap sizes more intuitive for this type of use case then specifying a sequence of boundary points? i don't think it is.

          (PS: i think the discussion about dynamically generating range points based on stats in the index should really be tracked in a distinct issue ... it's got a lot of complexity to it that we've talked about on the mailing list a few times that i don't really want to try and get into now)

          Show
          Hoss Man added a comment - (FYI: i haven't looked at the patch, because i'm trying to focus on 3.1 bug fixes, but grant specifically called me out on this on irc, so i'm replying based purely on the comments) I think it's a lot less confusing. You only have to express start, end and the size of the buckets you want. With facet.query, you have to write out each expression for every bucket and do the math on all the boundaries. ok ... fair enough, i can't deny the syntax you are proposing would be easier then specifying individual facet.query params, i'm just not convinced it would be completely intuitive. If i told someone about this feature, and then showed them this request... facet.range.start=10&facet.range.end=100&facet.range.gap=10,20,50 I would be hard pressed to explain why the resulting ranges were... 10-20, 20-40, 40-90, and 90-190 ...instead of... 10-20, 10-30, 10-60, and 60-110 (bearing in mind: facet.range.hardend defaults to "false") the existing start/end/gap params may not be 100% intuitive purely by name, but once you read about them once, they are fairly easy to grasp and not very confusing at all when you read examples later. likewise, a collection of facet.query objects is fairly intuitive and unambiguious. I just don't feel that way about what you are suggesting (then agian: i unleashed "mm" on the world, so i'm not really in a good position to throw stones) I'm also not convinced that it really makes sense in use cases like this (where you want variable sized buckets) to specify the gap sizes as a list, instead of the specifying the boundaries on each bucket. What you are describing almost feels like it should be a new category of faceting – or a variation on range faceting that doesn't involve the start/end/gap params at all (but could still respects facet.range.include and facet.range.other) Here's my counter-proposal/suggestion... I'm imagining a facet.range.buckets param that (if present) would override facet.range.gap, facet.range.start, and facet.range.end (so using facet.range would require either bucket or start/end/gap). facet.range.buckets would take a comma separated list of value representing the specific values you wanted to see used to define adjoining range boundary points, with some syntax ("..." seems natural) indicating "repeat last range size until reach this next value" so you could say... facet.range=price&facet.range.buckets=0,10,25,50,100,...,300 ...and the resulting ranges computed would be... 0-10, 10-25, 25-50, 50-100, 100-150, 150-200, 200-250, 250-300 ...likewise you could say... facet.range=age&facet.range.buckets=0,1,...,18,25,40,60,...,100 ...and you would get ranges for each year from 0 to 18, followed by 18-25, 25-40, 40-60, 60-80, 80-100. The tricky situations would be things like... facet.range.buckets=0,2,3,...,10 facet.range.buckets=0,7,...,10,20 ...the first could be dealt with using facet.range.hardend like we do today (so the resulting buckets were "0-2,2-5,5-8,8-11") but i don't think it should. I think it should result in "0-2,2-5,5-8,8-10" ... it's hard to imaging letting a param like facet.range.hardend override the explicit "10" in the buckets list when we don't have programaticly generate buckets of precisesly the same size, particularly when you consider the implications that would carry over to the second case (i really can't imagine letting that produce any ranges other then "0-7,7-10,10-20") So yeah ... that's what i think would make more sense then letting you specify a comma seperated list in the "gaps" param ... fundamentally i think it comes down to the point i alluded to earlier in this comment: is specifying a sequence of varying gap sizes more intuitive for this type of use case then specifying a sequence of boundary points? i don't think it is. (PS: i think the discussion about dynamically generating range points based on stats in the index should really be tracked in a distinct issue ... it's got a lot of complexity to it that we've talked about on the mailing list a few times that i don't really want to try and get into now)
          Hide
          Grant Ingersoll added a comment -

          Hoss, I can live with ranges. I had originally thought of doing that, but decided this syntax is simpler, especially for dates. Then again, we could just as well support both. The nice thing doing ranges gives you is you can have non-contiguous ranges, which might be interesting to some.

          As for:

          would be hard pressed to explain why the resulting ranges were...

          It really isn't that hard to explain:

          start + gap[0], prevEnd + gap[1], ... prevEnd + gap[i], ... prevEnd + gap[n] (and repeating until end)

          In other words, it's a variable width gap starting at whatever the last end point was.

          Show
          Grant Ingersoll added a comment - Hoss, I can live with ranges. I had originally thought of doing that, but decided this syntax is simpler, especially for dates. Then again, we could just as well support both. The nice thing doing ranges gives you is you can have non-contiguous ranges, which might be interesting to some. As for: would be hard pressed to explain why the resulting ranges were... It really isn't that hard to explain: start + gap [0] , prevEnd + gap [1] , ... prevEnd + gap [i] , ... prevEnd + gap [n] (and repeating until end) In other words, it's a variable width gap starting at whatever the last end point was.
          Hide
          Yonik Seeley added a comment -

          would be hard pressed to explain why the resulting ranges were...

          I agree - it requires summing all previous deltas to figure out what the current range actually is.
          I think we need to drive this from use-cases. The first use case that comes to mind is price ranges... and that would be a pain to insert a new price range if we were just dealing with a list of deltas. Anything I can think of where you would want variable sized buckets, it seems like you care more about the absolute values of those buckets, rather than their delta to the previous bucket.

          I pretty much came up with what Hoss suggested I think (except I didn't think of the "..." syntax).

          We could potentially support a mix of absolute starting points and ranges:
          0,5,10,20,100-1000
          Normally one would stick to one syntax or the other in a single request, but we could support both in a single parameter as a convenience.

          Show
          Yonik Seeley added a comment - would be hard pressed to explain why the resulting ranges were... I agree - it requires summing all previous deltas to figure out what the current range actually is. I think we need to drive this from use-cases. The first use case that comes to mind is price ranges... and that would be a pain to insert a new price range if we were just dealing with a list of deltas. Anything I can think of where you would want variable sized buckets, it seems like you care more about the absolute values of those buckets, rather than their delta to the previous bucket. I pretty much came up with what Hoss suggested I think (except I didn't think of the "..." syntax). We could potentially support a mix of absolute starting points and ranges: 0,5,10,20,100-1000 Normally one would stick to one syntax or the other in a single request, but we could support both in a single parameter as a convenience.
          Hide
          Jan Høydahl added a comment -

          +1 for using absolute values instead of gap values
          +1 for keeping the bucket spec as a separate param, including start and end
          +1 for letting the start/end in the spec automatically disable hardend

          I wrote down some thoughts the other day which is almost exactly what Hoss suggests, only I called it facet.range.spec Was going to start another issue but now that the dicussion is rolling here, here we go.

          The facet.range.spec must be intuitive and should include start, all absolute boundaries and end. Sample:

          facet.range.spec=0,5,25,50,100,400 ==> 0-5, 5-25, 25-50, 50-100, 100-400.
          

          To specify the gap size instead of next absolute threshold, we could have a +N syntax:

          facet.range.spec=0,5,25,+25,+50,400
          

          would be equivalent to the above absolute spec.

          A +N value would repeat as many times as needed to reach the next absolute value:

          facet.range.spec=0,5,+10,25,50,100,+100,400 ==> 0-5, 5-15, 15-25, 25-50, 50-100, 100-200, 200-300, 300-400
          facet.range.spec=0,5,+10,25,50,100,+100,400 ==> 0-5, 5-15, 15-25, 25-50, 50-100, 100-200, 200-300, 300-400
          

          Date example:

          facet.range.spec=*,2000-01-01T00:00:00Z,+5YEARS,NOW/YEAR,+1MONTH,NOW
          

          ...gives a range before 2000, two 5-year ranges 2000-2005, 2005-2010, one range until start of this year 2010-2011, then monthly ranges for this year until now.

          Now, having all this power of defining buckets available, it would be easy to introduce (i.e. feature creep a facet.range.labels param. Imagine:

          facet.range.spec=NOW/MONTH-1MONTH,NOW/MONTH,NOW/DAY-1DAY,NOW/DAY,NOW/HOUR,NOW,*
          facet.range.labels="Last month","This month","Yesterday","Today","This hour","Future"
          
          Show
          Jan Høydahl added a comment - +1 for using absolute values instead of gap values +1 for keeping the bucket spec as a separate param, including start and end +1 for letting the start/end in the spec automatically disable hardend I wrote down some thoughts the other day which is almost exactly what Hoss suggests, only I called it facet.range.spec Was going to start another issue but now that the dicussion is rolling here, here we go. The facet.range.spec must be intuitive and should include start, all absolute boundaries and end. Sample: facet.range.spec=0,5,25,50,100,400 ==> 0-5, 5-25, 25-50, 50-100, 100-400. To specify the gap size instead of next absolute threshold, we could have a +N syntax: facet.range.spec=0,5,25,+25,+50,400 would be equivalent to the above absolute spec. A +N value would repeat as many times as needed to reach the next absolute value: facet.range.spec=0,5,+10,25,50,100,+100,400 ==> 0-5, 5-15, 15-25, 25-50, 50-100, 100-200, 200-300, 300-400 facet.range.spec=0,5,+10,25,50,100,+100,400 ==> 0-5, 5-15, 15-25, 25-50, 50-100, 100-200, 200-300, 300-400 Date example: facet.range.spec=*,2000-01-01T00:00:00Z,+5YEARS,NOW/YEAR,+1MONTH,NOW ...gives a range before 2000, two 5-year ranges 2000-2005, 2005-2010, one range until start of this year 2010-2011, then monthly ranges for this year until now. Now, having all this power of defining buckets available, it would be easy to introduce (i.e. feature creep a facet.range.labels param. Imagine: facet.range.spec=NOW/MONTH-1MONTH,NOW/MONTH,NOW/DAY-1DAY,NOW/DAY,NOW/HOUR,NOW,* facet.range.labels= "Last month" , "This month" , "Yesterday" , "Today" , "This hour" , "Future"
          Hide
          Jan Høydahl added a comment -

          Both the date ranges above and other typical use cases call for overlapping buckets. This would be a generalization of Yonik's range suggestion. Imagine a real estate site with a bedrooms facet:

          f.bedrooms.facet.range.spec=1..*,2..*,3..*,4..*
          f.bedrooms.facet.range.labels="One or more","Two or more","Three or more","Four or more"
          

          I've chosen ".." as range delimiter since "-" would be confused with Date Math.

          Show
          Jan Høydahl added a comment - Both the date ranges above and other typical use cases call for overlapping buckets. This would be a generalization of Yonik's range suggestion. Imagine a real estate site with a bedrooms facet: f.bedrooms.facet.range.spec=1..*,2..*,3..*,4..* f.bedrooms.facet.range.labels= "One or more" , "Two or more" , "Three or more" , "Four or more" I've chosen ".." as range delimiter since "-" would be confused with Date Math.
          Hide
          Herman J Kiefus added a comment -

          With absolute ranges (no gap) couldn’t we also support alphabetic ranges? I would find this useful.

          Show
          Herman J Kiefus added a comment - With absolute ranges (no gap) couldn’t we also support alphabetic ranges? I would find this useful.
          Hide
          Herman J Kiefus added a comment -

          Also regarding arbitrary ranges:

          While using fact.query allows us to construct arbitrary ranges, we must then pick them out of the results separately. This becomes more difficult if we arbitrarily facet on two or more fields/expressions. Essentially we have to parse the results, grouping by expression and then picking out each range in the order we want to illustrate it. This would seem to be unnecessary, if we had the ability to add n absolute ranges to a facet.range.

          Show
          Herman J Kiefus added a comment - Also regarding arbitrary ranges: While using fact.query allows us to construct arbitrary ranges, we must then pick them out of the results separately. This becomes more difficult if we arbitrarily facet on two or more fields/expressions. Essentially we have to parse the results, grouping by expression and then picking out each range in the order we want to illustrate it. This would seem to be unnecessary, if we had the ability to add n absolute ranges to a facet.range.
          Hide
          Hoss Man added a comment - - edited

          In no particular order...

          • I like Jan's facet.range.spec naming suggestion better then my facet.range.buckets suggestion ... but i think facet.range.series, facet.range.seq, or facet.range.sequence might be better still.
          • I think Jan's point about N vs +N in the sequence list as a way to mix absolute values vs increments definitely makes sense, and would be consistent with the existing date match expression.
          • the complexity with supporting both absolute values and increments would be the question of what solr should do with input like facet.range.seq=10,20,+50,+100,120,150 ? what ranges would we return? (10-20, 20-70, 70-???....) would it be an error? would we give back ranges that overlapped? what about facet.range.seq=10,50,+50,100,150&facet.range.include=all .. would that result in one of the ranges being [100 TO 100] or would we throw that one out? (I think it would be wise to start out only implementing the absolute value approavh, since that seems (to me) the more useful option of the two, and then consider adding the incremental values as a separate issue later after hashing out hte semantics of these types of situations)
          • A few of Jan's sample input suggestions used {{ * }} at either the start or end of the sequence to denote "everything before" the second value or "everything after" the second to last value – i don't think we need to support this syntax, I think the existing facet.range.other would still be the right way to support this with facet.range.sequence. if you want "everything before" and/or "everything after" use facet.range.include=before and/or facet.range.include=after .. otherwise it would be confusing to decide what things like facet.range.include=before&facet.range.seq=*,10,20 and facet.range.include=none&facet.range.seq= * ,10,20 mean.
          • I REALLY don't think we should try to implement something like Jan's facet.range.labels suggestion. I can't imagine any way of supporting it thta wouldn't prevent or radically complicate the "..." type continuation of series i suggested before, and that seems like a much more powerful feature then labels. if a user is going to provide a label for every range, then you must enumerate every range, and you might as well enumerate them (and label them) with facet.query where the label and the query can be side by side.

          This...

          facet.query={!label="One or more"}bedrooms:[1 TO *]
          facet.query={!label="Two or more"}bedrooms:[2 TO *]
          facet.query={!label="Three or more"}bedrooms:[3 TO *]
          facet.query={!label="Four or more"}bedrooms:[4 TO *]
          

          ...seems way more readable, and less prone to user error in tweaking, then this...

          f.bedrooms.facet.range.spec=1..*,2..*,3..*,4..*
          f.bedrooms.facet.range.labels="One or more","Two or more","Three or more","Four or more"
          
          • Herman commented...

          While using fact.query allows us to construct arbitrary ranges, we must then pick them out of the results separately. This becomes more difficult if we arbitrarily facet on two or more fields/expressions.

          I don't see that as being particularly hard problem that we need to worry about helping users avoid, Especially since users can anotate those queries using localparams and set any arbitrary key=val pairs on them that you want to help organize them and identify them later when parsing the response...

          facet.query={!group=bed label="One or more"}bedrooms:[1 TO *]
          facet.query={!group=bed label="Two or more"}bedrooms:[2 TO *]
          facet.query={!group=bed label="Three or more"}bedrooms:[3 TO *]
          facet.query={!group=bed label="Four or more"}bedrooms:[4 TO *]
          facet.query={!group=size label="Small"}sqft:[* TO 1000]
          facet.query={!group=size label="Medium"}sqft:[1000 TO 2500]
          facet.query={!group=size label="Large"}sqft:[2500 TO *]
          
          Show
          Hoss Man added a comment - - edited In no particular order... I like Jan's facet.range.spec naming suggestion better then my facet.range.buckets suggestion ... but i think facet.range.series , facet.range.seq , or facet.range.sequence might be better still. I think Jan's point about N vs +N in the sequence list as a way to mix absolute values vs increments definitely makes sense, and would be consistent with the existing date match expression. the complexity with supporting both absolute values and increments would be the question of what solr should do with input like facet.range.seq=10,20,+50,+100,120,150 ? what ranges would we return? (10-20, 20-70, 70-???....) would it be an error? would we give back ranges that overlapped? what about facet.range.seq=10,50,+50,100,150&facet.range.include=all .. would that result in one of the ranges being [100 TO 100] or would we throw that one out? (I think it would be wise to start out only implementing the absolute value approavh, since that seems (to me) the more useful option of the two, and then consider adding the incremental values as a separate issue later after hashing out hte semantics of these types of situations) A few of Jan's sample input suggestions used {{ * }} at either the start or end of the sequence to denote "everything before" the second value or "everything after" the second to last value – i don't think we need to support this syntax, I think the existing facet.range.other would still be the right way to support this with facet.range.sequence . if you want "everything before" and/or "everything after" use facet.range.include=before and/or facet.range.include=after .. otherwise it would be confusing to decide what things like facet.range.include=before&facet.range.seq=*,10,20 and facet.range.include=none&facet.range.seq= * ,10,20 mean. I REALLY don't think we should try to implement something like Jan's facet.range.labels suggestion. I can't imagine any way of supporting it thta wouldn't prevent or radically complicate the "..." type continuation of series i suggested before, and that seems like a much more powerful feature then labels. if a user is going to provide a label for every range, then you must enumerate every range, and you might as well enumerate them (and label them) with facet.query where the label and the query can be side by side. This... facet.query={!label= "One or more" }bedrooms:[1 TO *] facet.query={!label= "Two or more" }bedrooms:[2 TO *] facet.query={!label= "Three or more" }bedrooms:[3 TO *] facet.query={!label= "Four or more" }bedrooms:[4 TO *] ...seems way more readable, and less prone to user error in tweaking, then this... f.bedrooms.facet.range.spec=1..*,2..*,3..*,4..* f.bedrooms.facet.range.labels= "One or more" , "Two or more" , "Three or more" , "Four or more" Herman commented... While using fact.query allows us to construct arbitrary ranges, we must then pick them out of the results separately. This becomes more difficult if we arbitrarily facet on two or more fields/expressions. I don't see that as being particularly hard problem that we need to worry about helping users avoid, Especially since users can anotate those queries using localparams and set any arbitrary key=val pairs on them that you want to help organize them and identify them later when parsing the response... facet.query={!group=bed label= "One or more" }bedrooms:[1 TO *] facet.query={!group=bed label= "Two or more" }bedrooms:[2 TO *] facet.query={!group=bed label= "Three or more" }bedrooms:[3 TO *] facet.query={!group=bed label= "Four or more" }bedrooms:[4 TO *] facet.query={!group=size label= "Small" }sqft:[* TO 1000] facet.query={!group=size label= "Medium" }sqft:[1000 TO 2500] facet.query={!group=size label= "Large" }sqft:[2500 TO *]
          Hide
          Jan Høydahl added a comment -

          If you want "everything before" and/or "everything after" use facet.range.include=before and/or facet.range.include=after .. otherwise it would be confusing to decide what things like facet.range.include=before&facet.range.seq=*,10,20 and facet.range.include=none&facet.range.seq= * ,10,20 mean.

          I think you meant facet.range.other=before/after, not facet.range.include=before/after - see, the syntax is confusing

          Guess my main point with the examples was to suggest that a facet.range.spec should not require facet.range.start and facet.range.end, but that the first and last values in the spec list should be taken as start and end, instead of requiring start and end in addition. In my opinion

          facet.range.spec=0,5,25,50,100,200,400
          

          is more fluent and easy to read that the first and last buckets will be 0-5 and 200-400, than with

          facet.range.spec=5,25,50,100,200
          facet.range.start=0
          facet.range.end=400
          

          and when talking about before/after,

          facet.range.spec=0,5,25,50,100,200,400,*
          

          is in my mind better than

          facet.range.spec=5,25,50,100,200
          facet.range.start=0
          facet.range.end=400
          facet.range.other=after
          

          Simply document that facet.range.spec is mutually exclusive to the parameters gap,start,end and other.

          I REALLY don't think we should try to implement something like Jan's facet.range.labels suggestion

          Sure, this is not a priority since it's possible with facet.query

          +1 on concentrating on a simple "spec" or "sequence" feature in some flavour

          Show
          Jan Høydahl added a comment - If you want "everything before" and/or "everything after" use facet.range.include=before and/or facet.range.include=after .. otherwise it would be confusing to decide what things like facet.range.include=before&facet.range.seq=*,10,20 and facet.range.include=none&facet.range.seq= * ,10,20 mean. I think you meant facet.range.other=before/after, not facet.range.include=before/after - see, the syntax is confusing Guess my main point with the examples was to suggest that a facet.range.spec should not require facet.range.start and facet.range.end, but that the first and last values in the spec list should be taken as start and end, instead of requiring start and end in addition. In my opinion facet.range.spec=0,5,25,50,100,200,400 is more fluent and easy to read that the first and last buckets will be 0-5 and 200-400, than with facet.range.spec=5,25,50,100,200 facet.range.start=0 facet.range.end=400 and when talking about before/after, facet.range.spec=0,5,25,50,100,200,400,* is in my mind better than facet.range.spec=5,25,50,100,200 facet.range.start=0 facet.range.end=400 facet.range.other=after Simply document that facet.range.spec is mutually exclusive to the parameters gap,start,end and other. I REALLY don't think we should try to implement something like Jan's facet.range.labels suggestion Sure, this is not a priority since it's possible with facet.query +1 on concentrating on a simple "spec" or "sequence" feature in some flavour
          Hide
          Robert Muir added a comment -

          Bulk move 3.2 -> 3.3

          Show
          Robert Muir added a comment - Bulk move 3.2 -> 3.3
          Hide
          Hoss Man added a comment -

          Guess my main point with the examples was to suggest that a facet.range.spec should not require facet.range.start and facet.range.end, but that the first and last values in the spec list should be taken as start and end, instead of requiring start and end in addition. ...

          Simply document that facet.range.spec is mutually exclusive to the parameters gap,start,end and other.

          I respect your argument, but i think if this new "spec" param is going to be mutually exclusive of facet.range.other as well as all of the existing mandatory facet.range params (facet.range.gap, facet.range.start, and facet.range.end) then it seems like what you're describing really shouldn't be an extension of "facet.range" at all ... it sounds should be some completley distinct type of faceting ("sequence faceting" ?) with it's own params and section in the response. ie...

          facet.seq=fieldName
          f.fieldName.facet.seq.spec=0,5,25,50,100,200,400,*
          f.fieldName.facet.seq.include=edge
          

          (where facet.seq.include has same semantics as facet.range.include ... except i don't think "edge" makes sense at all w/o the "other" param concept ... need to think it through more)

          Otherwise it could get really confusing for users trying to udnerstandwhat "facet.range.*" params do/don't make sense if they start using facet.range.gap and then switch to facet.range.spec (or vice-versa) ... ie: "how come i'm not getting the before/after ranges when i use 'facet.range.spec=0,5,25,50&facet.range.other=after' ?")

          Show
          Hoss Man added a comment - Guess my main point with the examples was to suggest that a facet.range.spec should not require facet.range.start and facet.range.end, but that the first and last values in the spec list should be taken as start and end, instead of requiring start and end in addition. ... Simply document that facet.range.spec is mutually exclusive to the parameters gap,start,end and other. I respect your argument, but i think if this new "spec" param is going to be mutually exclusive of facet.range.other as well as all of the existing mandatory facet.range params (facet.range.gap, facet.range.start, and facet.range.end) then it seems like what you're describing really shouldn't be an extension of "facet.range" at all ... it sounds should be some completley distinct type of faceting ("sequence faceting" ?) with it's own params and section in the response. ie... facet.seq=fieldName f.fieldName.facet.seq.spec=0,5,25,50,100,200,400,* f.fieldName.facet.seq.include=edge (where facet.seq.include has same semantics as facet.range.include ... except i don't think "edge" makes sense at all w/o the "other" param concept ... need to think it through more) Otherwise it could get really confusing for users trying to udnerstandwhat "facet.range.*" params do/don't make sense if they start using facet.range.gap and then switch to facet.range.spec (or vice-versa) ... ie: "how come i'm not getting the before/after ranges when i use 'facet.range.spec=0,5,25,50&facet.range.other=after' ?")
          Hide
          Jan Høydahl added a comment -

          I disagree that this is a fundamentally different feature requring its own plugin. It's simply an alternative way of specifying the gaps for range facet. I won't mind working through the documentation to describe clearly how facet.range.spec interacts with the other params, and also implement a parameter check which throws an exception if the user supplies incompatible params.

          What do others think?

          Show
          Jan Høydahl added a comment - I disagree that this is a fundamentally different feature requring its own plugin. It's simply an alternative way of specifying the gaps for range facet. I won't mind working through the documentation to describe clearly how facet.range.spec interacts with the other params, and also implement a parameter check which throws an exception if the user supplies incompatible params. What do others think?
          Hide
          Jan Høydahl added a comment -

          I've attempted a possible documentation of the facet.range.spec param as I envision it, at http://wiki.apache.org/solr/VariableRangeGaps

          Show
          Jan Høydahl added a comment - I've attempted a possible documentation of the facet.range.spec param as I envision it, at http://wiki.apache.org/solr/VariableRangeGaps
          Hide
          Hoss Man added a comment -

          Jan: i took a look at r3 of your VariableRangeGaps wiki, here are the things I'm concerned about because they seem a bit confusing/ambiguious....

          1) we need to decide what the behavior should be when the spec identifies values out of order (ie: 10, 50, 30) ... it might be tempting to say "allow them, and swap the values" (ie: "10-50, 30-50") but the merit of that approach doesn't seem worth the potential risk of silently hiding errors (ie: if the user made a typo and ment "10-50, 50-130") not to mention it could be really hard to understand what's going on in the case where some values are specified absolutely and some are specified as incriments (see bullet #3 in my "02/Apr/11 23:43" comment above – ie: what ranges would we produce for 10,20,+50,+100,120,150 ?).

          I would suggest define any case where the spec contains absolute value N after (effective) value M where N < M as an error and fail fast.

          Still not sure what (if anything) should be done about overlapping ranges that appear out of order (ie: 0,100,50..90,150 ... is that "0-100,50-90,90-150" ?)

          2) Independent of my opinion on the * syntax, I'm a little concerned by the descrepency in these examples...

          facet.range.spec=*,10,50,100,250,* - gives 5 ranges: MIN-10, 10-50, 50-100, 100-250, 250->MAX
          facet.range.spec=*,10,+40,+50,250,* - gives exactly the same ranges, using relative gap size
          facet.range.spec=0,+10,50,250,* - gives ranges: 0-10, 10-20, 20-30, 30-40, 40-50, 20-250, 250-MAX
          facet.range.spec=0,10,50,+50,+100,* - gives ranges: 0-10, 10-50, 50-100, 100-200, 200-300 repeating until max
          

          The first three examples suggest that * will be treated as "-Infinity" and "+Infinity" based on position (ie: the first and last ranges will be unbounded on one end) but in the last example the wording "...100-200, 200-300 repeating until max" seems inconsistent with that.

          In general, i'm concerned about providing a feature that would attempt to produce an infinite number of range queries, but even if that is intentional/acceptible the discrepency in syntax bothers me – I would suggest that that sequence should result in the ranges "0-10, 10-50, 50-100, 100-200, 200-Infinity"

          If we want to support the idea of "repeat the last increment continuously" that should be with it's own "repeat" syntax such as the "..." (three dots) i suggested in comment "17/Feb/11 23:50" above. I would argue that this should only be legal after an increment and before a concrete value (ie: 0,+10,...,100). Requiring it to follow an increment seems like a given (otherwise what exactly are you repeating?) requiring that it be followed by an absolute value is based on my concern that if it's the last item in the spec (or the last item before *) it results in an infinite number of ranges.

          3) The final comment on the page says (in section about facet.range.spec) ...

          This parameter can be combined with facet.range.include, but is mutually exclusive to facet.range.gap, facet.range.begin, facet.range.end and facet.range.other, resulting in an exception if uncompatible mix is attempted.

          That seems like it isn't specific enough about what is/isn't going to be allowed – particularly since all of the facet.range params can be specified on a per field basis.

          Imagine an index of "historic people" docs that provides range faceting on a bunch of date fields for significant milestones using common facet.range.start, facet.range.end, facet.range.gap params - and the solr admin wants to add "facet.range=height" and a "f.height.facet.range.spec" param....

          facet.range=birth_date
          facet.range=first_notable_historic_event
          facet.range=last_notable_historic_event
          facet.range=death_date
          facet.range.start=1500-01-01T00:00:00Z
          facet.range.end=NOW/YEAR+1YEAR
          facet.range.hardend=false
          facet.range.gap=+10YEARS
          facet.range=height
          f.height.facet.range.spec=*,100,+10,...,300,*
          

          ...that should be a totally legal usecase right? to mix and match this way? but how will the code behave? Technically the "height" field has both a facet.range.spec and facet.range.start params specified and there is no way to "unset" the default facet.range.start/facet.range.end/facet.range.gap params in the context of the "height" field

          4) Related to the same sentence as #3, it says that facet.range.include can be used with facet.range.spec, but it doesn't explain how it will be interpreted – this is kind of important since values like "outer" define how the "before" and "after" ranges are affected, and values like "edge" affect the "first" and "last" "gap ranges".

          Should all ranges produced by facet.range.spec be considered "gap" ranges? even the ones with no lower/upper bound?

          What would the following combination mean...

          facet.range.spec=100,150,200,250*
          facet.range.include=outer
          facet.range.include=edge
          
          • Are "100" and "250" considered "edge" boundaries?
          • Is "250" considered an "outer" boundery (on the equivilent of an "after" range) ?

          What about when the spec includes overlapping ranges?

          facet.range.spec=50..150,100..200,150,*
          facet.range.include=outer
          facet.range.include=edge
          
          • Is "200" an "edge" boundary?
          • Is "150" an "outer" boundary?
          Show
          Hoss Man added a comment - Jan: i took a look at r3 of your VariableRangeGaps wiki, here are the things I'm concerned about because they seem a bit confusing/ambiguious.... 1) we need to decide what the behavior should be when the spec identifies values out of order (ie: 10, 50, 30 ) ... it might be tempting to say "allow them, and swap the values" (ie: "10-50, 30-50") but the merit of that approach doesn't seem worth the potential risk of silently hiding errors (ie: if the user made a typo and ment "10-50, 50-130") not to mention it could be really hard to understand what's going on in the case where some values are specified absolutely and some are specified as incriments (see bullet #3 in my "02/Apr/11 23:43" comment above – ie: what ranges would we produce for 10,20,+50,+100,120,150 ?). I would suggest define any case where the spec contains absolute value N after (effective) value M where N < M as an error and fail fast. Still not sure what (if anything) should be done about overlapping ranges that appear out of order (ie: 0,100,50..90,150 ... is that "0-100,50-90,90-150" ?) 2) Independent of my opinion on the * syntax, I'm a little concerned by the descrepency in these examples... facet.range.spec=*,10,50,100,250,* - gives 5 ranges: MIN-10, 10-50, 50-100, 100-250, 250->MAX facet.range.spec=*,10,+40,+50,250,* - gives exactly the same ranges, using relative gap size facet.range.spec=0,+10,50,250,* - gives ranges: 0-10, 10-20, 20-30, 30-40, 40-50, 20-250, 250-MAX facet.range.spec=0,10,50,+50,+100,* - gives ranges: 0-10, 10-50, 50-100, 100-200, 200-300 repeating until max The first three examples suggest that * will be treated as "-Infinity" and "+Infinity" based on position (ie: the first and last ranges will be unbounded on one end) but in the last example the wording "...100-200, 200-300 repeating until max" seems inconsistent with that. In general, i'm concerned about providing a feature that would attempt to produce an infinite number of range queries, but even if that is intentional/acceptible the discrepency in syntax bothers me – I would suggest that that sequence should result in the ranges "0-10, 10-50, 50-100, 100-200, 200-Infinity" If we want to support the idea of "repeat the last increment continuously" that should be with it's own "repeat" syntax such as the "..." (three dots) i suggested in comment "17/Feb/11 23:50" above. I would argue that this should only be legal after an increment and before a concrete value (ie: 0,+10,...,100 ). Requiring it to follow an increment seems like a given (otherwise what exactly are you repeating?) requiring that it be followed by an absolute value is based on my concern that if it's the last item in the spec (or the last item before * ) it results in an infinite number of ranges. 3) The final comment on the page says (in section about facet.range.spec) ... This parameter can be combined with facet.range.include, but is mutually exclusive to facet.range.gap, facet.range.begin, facet.range.end and facet.range.other, resulting in an exception if uncompatible mix is attempted. That seems like it isn't specific enough about what is/isn't going to be allowed – particularly since all of the facet.range params can be specified on a per field basis. Imagine an index of "historic people" docs that provides range faceting on a bunch of date fields for significant milestones using common facet.range.start, facet.range.end, facet.range.gap params - and the solr admin wants to add "facet.range=height" and a "f.height.facet.range.spec" param.... facet.range=birth_date facet.range=first_notable_historic_event facet.range=last_notable_historic_event facet.range=death_date facet.range.start=1500-01-01T00:00:00Z facet.range.end=NOW/YEAR+1YEAR facet.range.hardend= false facet.range.gap=+10YEARS facet.range=height f.height.facet.range.spec=*,100,+10,...,300,* ...that should be a totally legal usecase right? to mix and match this way? but how will the code behave? Technically the "height" field has both a facet.range.spec and facet.range.start params specified and there is no way to "unset" the default facet.range.start/facet.range.end/facet.range.gap params in the context of the "height" field 4) Related to the same sentence as #3, it says that facet.range.include can be used with facet.range.spec, but it doesn't explain how it will be interpreted – this is kind of important since values like "outer" define how the "before" and "after" ranges are affected, and values like "edge" affect the "first" and "last" "gap ranges". Should all ranges produced by facet.range.spec be considered "gap" ranges? even the ones with no lower/upper bound? What would the following combination mean... facet.range.spec=100,150,200,250* facet.range.include= outer facet.range.include=edge Are "100" and "250" considered "edge" boundaries? Is "250" considered an "outer" boundery (on the equivilent of an "after" range) ? What about when the spec includes overlapping ranges? facet.range.spec=50..150,100..200,150,* facet.range.include= outer facet.range.include=edge Is "200" an "edge" boundary? Is "150" an "outer" boundary?
          Hide
          Jan Høydahl added a comment -

          Hoss: Good comments, which need to be decided upon, including corner cases.

          1)

          I would suggest define any case where the spec contains absolute value N after (effective) value M where N < M as an error and fail fast.

          Agree

          Still not sure what (if anything) should be done about overlapping ranges that appear out of order (ie: 0,100,50..90,150 ... is that "0-100,50-90,90-150" ?)

          If all gaps are specified as explicit ranges this is no ambiguity, so we could require all gaps to be explicit ranges if one wants to use it?

          2)

          The first three examples suggest that * will be treated as "-Infinity" and "+Infinity" based on position (ie: the first and last ranges will be unbounded on one end) but in the last example the wording "...100-200, 200-300 repeating until max" seems inconsistent with that.

          Agree. The 0,10,50,+50,+100,* example would create infinite gaps which would be less than desireable. But 0,10,50,+50,+100,500 would give repeating 100-gaps until upper bound 500, while 0,10,50,+50,+100,500,* would in addition give a last range 500-*. That was the intentional syntax.

          If we want to support the idea of "repeat the last increment continuously" that should be with it's own "repeat" syntax such as the "..." (three dots) i suggested in comment "17/Feb/11 23:50" above. I would argue that this should only be legal after an increment and before a concrete value (ie: 0,+10,...,100). Requiring it to follow an increment seems like a given (otherwise what exactly are you repeating?) requiring that it be followed by an absolute value is based on my concern that if it's the last item in the spec (or the last item before *) it results in an infinite number of ranges.

          Agree. Alternatively, if Solr could compute myField.max(), the useful value of "*" could be computed a bit smarter, but that would probably be hard to scale in a multi-shard setting.

          That seems like it isn't specific enough about what is/isn't going to be allowed – particularly since all of the facet.range params can be specified on a per field basis.

          Didn't really think much about the global params. Silently not caring about gap, begin, end, other would be one way to go, but then the error feedback is not explicit in case of misunderstanding; the user will see that he does not get back what he thought, and start reading the documentation

          I have no good answer to this, other than inventing some syntax. The default could be that facet.range.spec respects the global values for start and end, but also allow explicitly overriding start and end values as part of spec with a special syntax.
          The following params would result in ranges 0-1, 1-2, 2-3, 3-5, 5-10 :

          facet.range.start=0
          facet.range.end=10
          facet.range.gap=2
          f.bedrooms.facet.range.spec=1,2,3,5
          

          But these params would result in the same ranges because we specify start and end with a special syntax N.. for start and ..M for end:

          facet.range.start=100
          facet.range.end=200
          facet.range.gap=10
          f.bedrooms.facet.range.spec=0..,1,2,3,5,..10
          

          This would be equivalent with adding the two params f.bedrooms.facet.range.start=0&f.bedrooms.facet.range.end=10, which could then still be allowed as an alternative. If the first value of the spec is not an N.., we'll require a facet.range.start. If the last value of the spec is not ..M, we'll require facet.range.end.

          Also, it must not be allowed to specify both a global facet.range.gap and a global facet.range.spec.

          Would this be a good "compromise"? My primary reason for suggesting this is to give users a terse, intuitive syntax for ranges.

          4)

          Should all ranges produced by facet.range.spec be considered "gap" ranges? even the ones with no lower/upper bound?

          Good question. I think the values facet.range.include=upper/lower is clear. Outer/edge would need some more work/definition.

          Show
          Jan Høydahl added a comment - Hoss: Good comments, which need to be decided upon, including corner cases. 1) I would suggest define any case where the spec contains absolute value N after (effective) value M where N < M as an error and fail fast. Agree Still not sure what (if anything) should be done about overlapping ranges that appear out of order (ie: 0,100,50..90,150 ... is that "0-100,50-90,90-150" ?) If all gaps are specified as explicit ranges this is no ambiguity, so we could require all gaps to be explicit ranges if one wants to use it? 2) The first three examples suggest that * will be treated as "-Infinity" and "+Infinity" based on position (ie: the first and last ranges will be unbounded on one end) but in the last example the wording "...100-200, 200-300 repeating until max" seems inconsistent with that. Agree. The 0,10,50,+50,+100,* example would create infinite gaps which would be less than desireable. But 0,10,50,+50,+100,500 would give repeating 100-gaps until upper bound 500, while 0,10,50,+50,+100,500,* would in addition give a last range 500-*. That was the intentional syntax. If we want to support the idea of "repeat the last increment continuously" that should be with it's own "repeat" syntax such as the "..." (three dots) i suggested in comment "17/Feb/11 23:50" above. I would argue that this should only be legal after an increment and before a concrete value (ie: 0,+10,...,100). Requiring it to follow an increment seems like a given (otherwise what exactly are you repeating?) requiring that it be followed by an absolute value is based on my concern that if it's the last item in the spec (or the last item before *) it results in an infinite number of ranges. Agree. Alternatively, if Solr could compute myField.max(), the useful value of "*" could be computed a bit smarter, but that would probably be hard to scale in a multi-shard setting. That seems like it isn't specific enough about what is/isn't going to be allowed – particularly since all of the facet.range params can be specified on a per field basis. Didn't really think much about the global params. Silently not caring about gap, begin, end, other would be one way to go, but then the error feedback is not explicit in case of misunderstanding; the user will see that he does not get back what he thought, and start reading the documentation I have no good answer to this, other than inventing some syntax. The default could be that facet.range.spec respects the global values for start and end, but also allow explicitly overriding start and end values as part of spec with a special syntax. The following params would result in ranges 0-1, 1-2, 2-3, 3-5, 5-10 : facet.range.start=0 facet.range.end=10 facet.range.gap=2 f.bedrooms.facet.range.spec=1,2,3,5 But these params would result in the same ranges because we specify start and end with a special syntax N.. for start and ..M for end: facet.range.start=100 facet.range.end=200 facet.range.gap=10 f.bedrooms.facet.range.spec=0..,1,2,3,5,..10 This would be equivalent with adding the two params f.bedrooms.facet.range.start=0&f.bedrooms.facet.range.end=10, which could then still be allowed as an alternative. If the first value of the spec is not an N.., we'll require a facet.range.start. If the last value of the spec is not ..M, we'll require facet.range.end. Also, it must not be allowed to specify both a global facet.range.gap and a global facet.range.spec. Would this be a good "compromise"? My primary reason for suggesting this is to give users a terse, intuitive syntax for ranges. 4) Should all ranges produced by facet.range.spec be considered "gap" ranges? even the ones with no lower/upper bound? Good question. I think the values facet.range.include=upper/lower is clear. Outer/edge would need some more work/definition.
          Hide
          Jan Høydahl added a comment -

          I've given the Wiki page another take, with the new proposed start/end syntax and added an example or two. The "mutually exclusive" sentence now boils down to facet.range.gap/facet.range.spec being mutually exclusive (one the same field). Have a look at http://wiki.apache.org/solr/VariableRangeGaps#facet.range.spec

          Show
          Jan Høydahl added a comment - I've given the Wiki page another take, with the new proposed start/end syntax and added an example or two. The "mutually exclusive" sentence now boils down to facet.range.gap/facet.range.spec being mutually exclusive (one the same field). Have a look at http://wiki.apache.org/solr/VariableRangeGaps#facet.range.spec
          Hide
          Jan Høydahl added a comment -

          Here's Grant's original syntax proposal which is removed from issue description to avoid confusion:

          I'd propose the syntax to be a comma separated list of sizes for each bucket. If only one value is specified, then it behaves as it currently does. Otherwise, it creates the different size buckets. If the number of buckets doesn't evenly divide up the space, then the size of the last bucket specified is used to fill out the remaining space (not sure on this)
          For instance,
          facet.range.start=0
          facet.range.end=400
          facet.range.gap=5,25,50,100

          would yield buckets of:
          0-5,5-30,30-80,80-180,180-280,280-380,380-400

          Show
          Jan Høydahl added a comment - Here's Grant's original syntax proposal which is removed from issue description to avoid confusion: I'd propose the syntax to be a comma separated list of sizes for each bucket. If only one value is specified, then it behaves as it currently does. Otherwise, it creates the different size buckets. If the number of buckets doesn't evenly divide up the space, then the size of the last bucket specified is used to fill out the remaining space (not sure on this) For instance, facet.range.start=0 facet.range.end=400 facet.range.gap=5,25,50,100 would yield buckets of: 0-5,5-30,30-80,80-180,180-280,280-380,380-400
          Hide
          Jan Høydahl added a comment -

          One thing this improvement needs to tackle is how to return the range buckets in the Response. It will not be enough with the simple range_facet format

          <lst name="facet_ranges">
            <lst name="url_length">
              <lst name="counts">
                <int name="42">1</int>
                <int name="45">1</int>
                <int name="51">1</int>
                <int name="66">1</int>
              </lst>
              <int name="gap">3</int>
              <int name="start">0</int>
              <int name="end">102</int>
            </lst>
          </lst>
          

          We need something which can return the explicit ranges, similar to what facet_queries has. This format can then be used for the old plain gap format as well.

          <lst name="facet_ranges">
            <lst name="url_length">
              <lst name="counts">
                <int name="[42 TO 45}">1</int>
                <int name="[45 TO 48}">1</int>
                <int name="[51 TO 54}">1</int>
                <int name="[66 TO 69}">1</int>
              </lst>
              <int name="gap">3</int>
              <int name="start">0</int>
              <int name="end">102</int>
            </lst>
            <lst name="bedrooms">
              <lst name="counts">
                <int name="[1 TO *]">12</int>
                <int name="[2 TO *]">31</int>
                <int name="[3 TO *]">26</int>
                <int name="[4 TO *]">9</int>
              </lst>
              <int name="spec">1..*,2..*,3..*,4..*</int>
              <int name="include">all</int>
            </lst>
          </lst>
          
          Show
          Jan Høydahl added a comment - One thing this improvement needs to tackle is how to return the range buckets in the Response. It will not be enough with the simple range_facet format <lst name= "facet_ranges" > <lst name= "url_length" > <lst name= "counts" > <int name= "42" > 1 </int> <int name= "45" > 1 </int> <int name= "51" > 1 </int> <int name= "66" > 1 </int> </lst> <int name= "gap" > 3 </int> <int name= "start" > 0 </int> <int name= "end" > 102 </int> </lst> </lst> We need something which can return the explicit ranges, similar to what facet_queries has. This format can then be used for the old plain gap format as well. <lst name= "facet_ranges" > <lst name= "url_length" > <lst name= "counts" > <int name= "[42 TO 45}" > 1 </int> <int name= "[45 TO 48}" > 1 </int> <int name= "[51 TO 54}" > 1 </int> <int name= "[66 TO 69}" > 1 </int> </lst> <int name= "gap" > 3 </int> <int name= "start" > 0 </int> <int name= "end" > 102 </int> </lst> <lst name= "bedrooms" > <lst name= "counts" > <int name= "[1 TO *]" > 12 </int> <int name= "[2 TO *]" > 31 </int> <int name= "[3 TO *]" > 26 </int> <int name= "[4 TO *]" > 9 </int> </lst> <int name= "spec" > 1..*,2..*,3..*,4..* </int> <int name= "include" > all </int> </lst> </lst>
          Hide
          Robert Muir added a comment -

          3.4 -> 3.5

          Show
          Robert Muir added a comment - 3.4 -> 3.5
          Hide
          Hoss Man added a comment -

          Jan: I've got to be completely honest here – catching up on this issue, I got really confused and lost by some of your comments and the updated docs.

          This sequence of comments really stands out at me...

          I have no good answer to this, other than inventing some syntax.
          ...
          I think the values facet.range.include=upper/lower is clear. Outer/edge would need some more work/definition.
          ...
          My primary reason for suggesting this is to give users a terse, intuitive syntax for ranges.
          ...
          One thing this improvement needs to tackle is how to return the range buckets in the Response. It will not be enough with the simple range_facet format ... We need something which can return the explicit ranges,

          (emphasis added by me)

          I really liked the simplicity of your earlier proposal, and I agree that it would be really powerful/helpful to give users a terse, intuitive syntax for specifying sequential ranges of variable sizes – but it seems like we're really moving away from the syntax being "intuitive" because of the hoops you're having to jump through to treat this as an extension of the existing "facet.range" param in your design.

          I think we really ought to revisit my earlier suggestion to approach this as an entirely new "type" of faceting - not a new plugin or a contrib, but a new first-class type of faceting that FacetComponent would support, right along side facet.field, facet.query, and facet.range. Let's ignore everything about the existing facet.range.* param syntax, and the facet_range response format, and think about what makes the most sense for this feature on it's own. If there are ideas from facet.range that make sense to carry over (like facet.range.include) then great – but let's approach it from the "something new that can borrow from facet.range" standpoint instead of the "extension to facet.range that has a bunch of caveats with how facet.range already works"

          I mean: if it looks like a duck, walks like a duck, and quacks like a duck, then i'm happy to call it a duck – but in this case:

          • doesn't make sense with facet.range.other
          • needs special start/end syntax to play nice with facet.range.start/end
          • needs to change the response format

          ...ie: it doesn't look the same, it doesn't walk the same, and it doesn't quack.

          Regardless of whether this functionality becomes part of facet.range or not, I wanted to comment specifically on this idea...

          If all gaps are specified as explicit ranges this is no ambiguity, so we could require all gaps to be explicit ranges if one wants to use it?

          This seems like a really harsh limitation to impose. If the only way to use an explicit range is in use cases where you only use explicit ranges, then what value add does this feature give you over just using multiple facet.query params? (it might be marginally fewer characters, but multiple facet.query params seem more intuitive and easier to read). I mean: I don't have a solution to propose, it just seems like there's not much point in supporting explicit ranges in that case.

          Having not thought about this issue in almost a month, and revisiting it with (fairly) fresh eyes, and thinking about all the use cases that have been discussed, it seems like the main goals we should address are really:

          • an intuitive syntax for specifying end points for ranges of varying sizes
          • ability to specify range end points using either fixed values or increments
          • ability to specify that ranges should be either use sequential end points, or be overlapping relative some fixed min/max value

          In other words: the only reason (that i know of) why overlapping ranges even came up in this issue was use cases like...

             Price: $0-10, $0-20, $0-50, $0-100
             Date: NOW-1DAY TO NOW, NOW-1MONTH TO NOW, NOW-1YEAR TO NOW
          

          ...there doesn't seem to be a lot of motivations for using overlapping ranges in the "middle" of a sequence, and these types of use cases where all the ranges overlap seem just as important as use cases where the ranges don't overlap at all...

             Price: $0-10, $10-20, $20-50, $50-100
             Date: NOW-1DAY TO NOW, NOW-1MONTH TO NOW-1DAY, NOW-1YEAR TO NOW-1MONTH
          

          ...so let's try to focus on a syntax that makes both easy, using both fixed and relative values, w/o worrying about supporting arbitrary overlapping ranges (since I can't think of a use case for it, and it could always be achieved using facet.query)

          So how about something like...

           facet.sequence=<fieldname>
           facet.sequence.spec=[<wild>,]?<val>,<relval>[,<relval>]*[,<wild>]?
           facet.sequence.type=[before|after|between]
           facet.sequence.include=(same as facet.range.include)
          

          Where "relval" would either be a concrete value, or a relative value; the effective sequence has to either increase or decrease consistently or it's an error; and "facet.sequence.type" determines whether the ranges are overlapping ("before" and "after") or not ("between")

          So if you had a spec like this...

           facet.sequence.spec=0,10,+10,50,+50
          

          Then depending on facet.sequence.type you could either get...

           facet.sequence.type=after
               Price: $0-10, $0-20, $0-50, $0-100
           facet.sequence.type=between
               Price: $0-10, $10-20, $20-50, $50-100
           facet.sequence.type=before
               Price: $0-100, $10-100, $20-100, $50-100
          

          "*" could be used at the start or end to indicate that you wanted an unbounded range, but it wouldn't be a factor in determining the "fixed point" used if type was "after" or "before", ie...

           f.price.facet.sequence.spec=*,0,10,+10,50,+50,*
           f.created.facet.sequence.spec=NOW,-1DAY,-1MONTH,-1YEAR
          
           facet.sequence.type=after
               Price: below $0, $0-10, $0-20, $0-50, $0-100, $100 and up
               Created: NOW-1YEAR TO NOW, NOW-1YEAR TO NOW-1DAY, NOW-1YEAR TO NOW-1MONTH
           facet.sequence.type=between
               Price: below $0, $0-10, $10-20, $20-50, $50-100, $100 and up
               Created: NOW-1DAY TO NOW, NOW-1MONTH TO NOW-1DAY, NOW-1YEAR TO NOW-1MONTH
           facet.sequence.type=before
               Price: below $0, $0-100, $10-100, $20-100, $50-100, $100 and up
               Created: NOW-1DAY TO NOW, NOW-1MONTH TO NOW, NOW-1YEAR TO NOW
          

          ...if we defined things that way, i think that would simplify a lot of the complexity we've been talking about, and simplify some of the use cases.

          the only remaining issues that have been brought up (that i can think of) that would still need to be work out would be:

          1) what the response format needs to look like - I'd vote to punt on this until we figure out the semantics.

          2) when exactly ranges are inclusive/exclusive of their endpoints - i think we should be able reuse the semantics from facet.range.include here, including "edge", if we define ranges involving "*" as "outer" ranges, but we'd need to work through more scenarios to be sure.

          3) what happens if an increment overlaps with an absolute value, ie: my original example of "10,20,+50,+100,120,150". The three possible solutions I can think of are:

          • fail loudly
          • implement "precedence" rules, ie: that absolute values trump relative values (10-20,20-70,70-120,120-150) or vice-versa (10-20,20-70,70-170)
          • implement precedence rules but let them be controlled via a request param (similar to how "facet.range.hardend" works)

          What do you think? Are there any key use cases / features we've talked about that you think this approach overlooks? Do you still think it should really be an extension to "facet.range" ?

          Show
          Hoss Man added a comment - Jan: I've got to be completely honest here – catching up on this issue, I got really confused and lost by some of your comments and the updated docs. This sequence of comments really stands out at me... I have no good answer to this, other than inventing some syntax. ... I think the values facet.range.include=upper/lower is clear. Outer/edge would need some more work/definition. ... My primary reason for suggesting this is to give users a terse, intuitive syntax for ranges. ... One thing this improvement needs to tackle is how to return the range buckets in the Response. It will not be enough with the simple range_facet format ... We need something which can return the explicit ranges, (emphasis added by me) I really liked the simplicity of your earlier proposal, and I agree that it would be really powerful/helpful to give users a terse, intuitive syntax for specifying sequential ranges of variable sizes – but it seems like we're really moving away from the syntax being "intuitive" because of the hoops you're having to jump through to treat this as an extension of the existing "facet.range" param in your design. I think we really ought to revisit my earlier suggestion to approach this as an entirely new "type" of faceting - not a new plugin or a contrib, but a new first-class type of faceting that FacetComponent would support, right along side facet.field, facet.query, and facet.range. Let's ignore everything about the existing facet.range.* param syntax, and the facet_range response format, and think about what makes the most sense for this feature on it's own. If there are ideas from facet.range that make sense to carry over (like facet.range.include) then great – but let's approach it from the "something new that can borrow from facet.range" standpoint instead of the "extension to facet.range that has a bunch of caveats with how facet.range already works" I mean: if it looks like a duck, walks like a duck, and quacks like a duck, then i'm happy to call it a duck – but in this case: doesn't make sense with facet.range.other needs special start/end syntax to play nice with facet.range.start/end needs to change the response format ...ie: it doesn't look the same, it doesn't walk the same, and it doesn't quack. — Regardless of whether this functionality becomes part of facet.range or not, I wanted to comment specifically on this idea... If all gaps are specified as explicit ranges this is no ambiguity, so we could require all gaps to be explicit ranges if one wants to use it? This seems like a really harsh limitation to impose. If the only way to use an explicit range is in use cases where you only use explicit ranges, then what value add does this feature give you over just using multiple facet.query params? (it might be marginally fewer characters, but multiple facet.query params seem more intuitive and easier to read). I mean: I don't have a solution to propose, it just seems like there's not much point in supporting explicit ranges in that case. — Having not thought about this issue in almost a month, and revisiting it with (fairly) fresh eyes, and thinking about all the use cases that have been discussed, it seems like the main goals we should address are really: an intuitive syntax for specifying end points for ranges of varying sizes ability to specify range end points using either fixed values or increments ability to specify that ranges should be either use sequential end points, or be overlapping relative some fixed min/max value In other words: the only reason (that i know of) why overlapping ranges even came up in this issue was use cases like... Price: $0-10, $0-20, $0-50, $0-100 Date: NOW-1DAY TO NOW, NOW-1MONTH TO NOW, NOW-1YEAR TO NOW ...there doesn't seem to be a lot of motivations for using overlapping ranges in the "middle" of a sequence, and these types of use cases where all the ranges overlap seem just as important as use cases where the ranges don't overlap at all... Price: $0-10, $10-20, $20-50, $50-100 Date: NOW-1DAY TO NOW, NOW-1MONTH TO NOW-1DAY, NOW-1YEAR TO NOW-1MONTH ...so let's try to focus on a syntax that makes both easy, using both fixed and relative values, w/o worrying about supporting arbitrary overlapping ranges (since I can't think of a use case for it, and it could always be achieved using facet.query) So how about something like... facet.sequence=<fieldname> facet.sequence.spec=[<wild>,]?<val>,<relval>[,<relval>]*[,<wild>]? facet.sequence.type=[before|after|between] facet.sequence.include=(same as facet.range.include) Where "relval" would either be a concrete value, or a relative value; the effective sequence has to either increase or decrease consistently or it's an error; and "facet.sequence.type" determines whether the ranges are overlapping ("before" and "after") or not ("between") So if you had a spec like this... facet.sequence.spec=0,10,+10,50,+50 Then depending on facet.sequence.type you could either get... facet.sequence.type=after Price: $0-10, $0-20, $0-50, $0-100 facet.sequence.type=between Price: $0-10, $10-20, $20-50, $50-100 facet.sequence.type=before Price: $0-100, $10-100, $20-100, $50-100 "*" could be used at the start or end to indicate that you wanted an unbounded range, but it wouldn't be a factor in determining the "fixed point" used if type was "after" or "before", ie... f.price.facet.sequence.spec=*,0,10,+10,50,+50,* f.created.facet.sequence.spec=NOW,-1DAY,-1MONTH,-1YEAR facet.sequence.type=after Price: below $0, $0-10, $0-20, $0-50, $0-100, $100 and up Created: NOW-1YEAR TO NOW, NOW-1YEAR TO NOW-1DAY, NOW-1YEAR TO NOW-1MONTH facet.sequence.type=between Price: below $0, $0-10, $10-20, $20-50, $50-100, $100 and up Created: NOW-1DAY TO NOW, NOW-1MONTH TO NOW-1DAY, NOW-1YEAR TO NOW-1MONTH facet.sequence.type=before Price: below $0, $0-100, $10-100, $20-100, $50-100, $100 and up Created: NOW-1DAY TO NOW, NOW-1MONTH TO NOW, NOW-1YEAR TO NOW ...if we defined things that way, i think that would simplify a lot of the complexity we've been talking about, and simplify some of the use cases. the only remaining issues that have been brought up (that i can think of) that would still need to be work out would be: 1) what the response format needs to look like - I'd vote to punt on this until we figure out the semantics. 2) when exactly ranges are inclusive/exclusive of their endpoints - i think we should be able reuse the semantics from facet.range.include here, including "edge", if we define ranges involving "*" as "outer" ranges, but we'd need to work through more scenarios to be sure. 3) what happens if an increment overlaps with an absolute value, ie: my original example of "10,20,+50,+100,120,150". The three possible solutions I can think of are: fail loudly implement "precedence" rules, ie: that absolute values trump relative values (10-20,20-70,70-120,120-150) or vice-versa (10-20,20-70,70-170) implement precedence rules but let them be controlled via a request param (similar to how "facet.range.hardend" works) — What do you think? Are there any key use cases / features we've talked about that you think this approach overlooks? Do you still think it should really be an extension to "facet.range" ?
          Hide
          Hoss Man added a comment -

          Bulk of fixVersion=3.6 -> fixVersion=4.0 for issues that have no assignee and have not been updated recently.

          email notification suppressed to prevent mass-spam
          psuedo-unique token identifying these issues: hoss20120321nofix36

          Show
          Hoss Man added a comment - Bulk of fixVersion=3.6 -> fixVersion=4.0 for issues that have no assignee and have not been updated recently. email notification suppressed to prevent mass-spam psuedo-unique token identifying these issues: hoss20120321nofix36
          Hide
          Jan Høydahl added a comment -

          Note to self: catch up on this again

          Show
          Jan Høydahl added a comment - Note to self: catch up on this again
          Hide
          Mandar added a comment -

          I have tried using the price range facet with three different ways, but was not able to get it working for variable gaps

          1) select/?q=%3A&facet=true&facet.query=minPrice:[*+TO+500]&facet.query=minPrice:[500+TO+*]

          Returns
          <lst name="facet_queries">
          <int name="minPrice:[* TO 500]">122</int>
          <int name="minPrice:[500 TO *]">5722</int>
          </lst>

          2) /select?q=%3A&wt=xmlfacet=true&facet.field=minPrice&facet.range=minPrice&f.minPrice.facet.range.start=0&f.minPrice.facet.range.end=10000&f.minPrice.facet.range.gap=1000
          <lst name="minPrice">
          <lst name="counts">
          <int name="0">522</int>
          <int name="1000">1204</int>
          <int name="2000">1077</int>
          <int name="3000">817</int>
          <int name="4000">563</int>
          <int name="5000">302</int>
          <int name="6000">245</int>
          <int name="7000">324</int>
          <int name="8000">112</int>
          <int name="9000">200</int>
          </lst>

          3) /select?q=%3A&wt=xmlfacet=true&facet.field=minPrice&facet.range=minPrice&f.minPrice.facet.range.start=0&f.minPrice.facet.range.end=10000&f.minPrice.facet.range.gap=1000&facet.range.spec=0,1000,3000,5000

          There is no error, facet.range.spec with facet.range doesn't come back with expected facet results as above.
          Tried using version 3.6 & 4 alpha

          Is there anything wrong with my query, for using range.spec

          I have even tried using f.minPrice.facet.range.gap=1000,2000,3000 and get parse error.

          Or is range.spec not a part of these versions.

          Show
          Mandar added a comment - I have tried using the price range facet with three different ways, but was not able to get it working for variable gaps 1) select/?q= %3A &facet=true&facet.query=minPrice: [*+TO+500] &facet.query=minPrice: [500+TO+*] Returns <lst name="facet_queries"> <int name="minPrice: [* TO 500] ">122</int> <int name="minPrice: [500 TO *] ">5722</int> </lst> 2) /select?q= %3A &wt=xmlfacet=true&facet.field=minPrice&facet.range=minPrice&f.minPrice.facet.range.start=0&f.minPrice.facet.range.end=10000&f.minPrice.facet.range.gap=1000 <lst name="minPrice"> <lst name="counts"> <int name="0">522</int> <int name="1000">1204</int> <int name="2000">1077</int> <int name="3000">817</int> <int name="4000">563</int> <int name="5000">302</int> <int name="6000">245</int> <int name="7000">324</int> <int name="8000">112</int> <int name="9000">200</int> </lst> 3) /select?q= %3A &wt=xmlfacet=true&facet.field=minPrice&facet.range=minPrice&f.minPrice.facet.range.start=0&f.minPrice.facet.range.end=10000&f.minPrice.facet.range.gap=1000&facet.range.spec=0,1000,3000,5000 There is no error, facet.range.spec with facet.range doesn't come back with expected facet results as above. Tried using version 3.6 & 4 alpha Is there anything wrong with my query, for using range.spec I have even tried using f.minPrice.facet.range.gap=1000,2000,3000 and get parse error. Or is range.spec not a part of these versions.
          Hide
          Jan Høydahl added a comment -

          Mandar, since this patch is Unresolved, the feature is not part of any version (yet), there are only patches attached, which may not apply cleanly if they are old.

          Show
          Jan Høydahl added a comment - Mandar, since this patch is Unresolved, the feature is not part of any version (yet), there are only patches attached, which may not apply cleanly if they are old.
          Hide
          Markus Jelsma added a comment -

          Any reason why this issue is off the radar?

          Show
          Markus Jelsma added a comment - Any reason why this issue is off the radar?
          Hide
          Jeroen Steggink added a comment -

          I'm also very interested in a variable range gap feature.

          Show
          Jeroen Steggink added a comment - I'm also very interested in a variable range gap feature.
          Hide
          Otis Gospodnetic added a comment -

          Markus Jelsma:

          Any reason why this issue is off the radar?

          Because it's old, received a lot of attention, a lot of very verbose comments that are probably good, but hard for people to read/focus/understand, yet it wasn't committed when it was a hot topic and so it remains in status quo. Maybe Chris Hostetter and Jan Høydahl have the power to get this committed. It does sounds like a very useful feature.

          Show
          Otis Gospodnetic added a comment - Markus Jelsma : Any reason why this issue is off the radar? Because it's old, received a lot of attention, a lot of very verbose comments that are probably good, but hard for people to read/focus/understand, yet it wasn't committed when it was a hot topic and so it remains in status quo. Maybe Chris Hostetter and Jan Høydahl have the power to get this committed. It does sounds like a very useful feature.
          Hide
          Steve Rowe added a comment -

          Bulk move 4.4 issues to 4.5 and 5.0

          Show
          Steve Rowe added a comment - Bulk move 4.4 issues to 4.5 and 5.0
          Hide
          solr-user added a comment -

          We are very interested in the gap feature as well. We implemented some custom code to do this for solr 1.4x but havent updated the code to 4.5x (and probably wont do so for quite some time).

          Show
          solr-user added a comment - We are very interested in the gap feature as well. We implemented some custom code to do this for solr 1.4x but havent updated the code to 4.5x (and probably wont do so for quite some time).
          Hide
          Benjamin Brandmeier added a comment -

          I'm also interested in this. Currently I'm using lots of facet.query parameters. It works like that, however, I guess it could be done simpler and maybe even more performant with multiple range gap values.

          Show
          Benjamin Brandmeier added a comment - I'm also interested in this. Currently I'm using lots of facet.query parameters. It works like that, however, I guess it could be done simpler and maybe even more performant with multiple range gap values.
          Hide
          Ted Sullivan added a comment -

          At the very least, we should revise the discussion of this feature on the SimpleFacetParameters Wiki page. The Wiki page does contain a disclaimer "The following section on variable width gaps discusses uncommitted code" but the comment is anchored to "Solr 3.6, Solr4.0" so person might reasonably expect that it has been released by now (Solr 4.6).

          Show
          Ted Sullivan added a comment - At the very least, we should revise the discussion of this feature on the SimpleFacetParameters Wiki page. The Wiki page does contain a disclaimer "The following section on variable width gaps discusses uncommitted code" but the comment is anchored to "Solr 3.6, Solr4.0" so person might reasonably expect that it has been released by now (Solr 4.6).
          Hide
          Ted Sullivan added a comment -

          Updated the patch to the current svn trunk. The old patch does not work anymore because the paths have changed since this was uploaded.

          Show
          Ted Sullivan added a comment - Updated the patch to the current svn trunk. The old patch does not work anymore because the paths have changed since this was uploaded.
          Hide
          Shalin Shekhar Mangar added a comment -

          Thanks Ted.

          I think that just having facet.range.gap accept multiple values is a good improvement to start with. We can spin it off to a new issue and commit it. It is clear that implementing the full facet.sequence.* feature is a bigger discussion and will happen when someone has the time and inclination. We should not stop this small improvement in the wait for the bigger.

          Does anyone have any objections on committing Grant/Ted's patch?
          Attn: Hoss Man, Jan Høydahl, Grant Ingersoll

          Show
          Shalin Shekhar Mangar added a comment - Thanks Ted. I think that just having facet.range.gap accept multiple values is a good improvement to start with. We can spin it off to a new issue and commit it. It is clear that implementing the full facet.sequence.* feature is a bigger discussion and will happen when someone has the time and inclination. We should not stop this small improvement in the wait for the bigger. Does anyone have any objections on committing Grant/Ted's patch? Attn: Hoss Man , Jan Høydahl , Grant Ingersoll
          Hide
          Grant Ingersoll added a comment -

          No objections from me, I'll defer to your review, Shalin Shekhar Mangar

          Show
          Grant Ingersoll added a comment - No objections from me, I'll defer to your review, Shalin Shekhar Mangar
          Hide
          Ted Sullivan added a comment -

          I agree with Jan Høydahl's earlier comment (9/Sep/11):

          One thing this improvement needs to tackle is how to return the range buckets in the Response

          As is done in facet.query. Unless we do this, the response is a bit too cryptic. I would vote for adding this code before committing it (I'll volunteer) and spinning off the facet.range.spec or facet.sequence idea to a new issue as Shalin suggests. So for a facet.range.start=0, facet.range.end=1000, facet.range.gap=10,90,900 the labels would be as Jan suggests: [0 TO 10}, [10 TO 100}, [100 TO 1000}. This would be done even if the gaps are constant. As it is now, all you see in the response are the range starts rather than the ranges.

          Show
          Ted Sullivan added a comment - I agree with Jan Høydahl 's earlier comment (9/Sep/11): One thing this improvement needs to tackle is how to return the range buckets in the Response As is done in facet.query. Unless we do this, the response is a bit too cryptic. I would vote for adding this code before committing it (I'll volunteer) and spinning off the facet.range.spec or facet.sequence idea to a new issue as Shalin suggests. So for a facet.range.start=0, facet.range.end=1000, facet.range.gap=10,90,900 the labels would be as Jan suggests: [0 TO 10}, [10 TO 100}, [100 TO 1000}. This would be done even if the gaps are constant. As it is now, all you see in the response are the range starts rather than the ranges.
          Hide
          Jan Høydahl added a comment -

          So for a facet.range.start=0, facet.range.end=1000, facet.range.gap=10,90,900 the labels would be as Jan suggests: [0 TO 10}, [10 TO 100}, [100 TO 1000}.

          Ted Sullivan, I am not in favor of a list of relative gaps, I think it is user unfriendly. That's why I suggested a new facet.range.spec or something like Hoss' facet.range.buckets. But if you for some reason wish to extend the "gap" parameter, I guess it needs to remain relative gaps since that is kind of implied in the wording?

          Show
          Jan Høydahl added a comment - So for a facet.range.start=0, facet.range.end=1000, facet.range.gap=10,90,900 the labels would be as Jan suggests: [0 TO 10}, [10 TO 100}, [100 TO 1000}. Ted Sullivan , I am not in favor of a list of relative gaps, I think it is user unfriendly. That's why I suggested a new facet.range.spec or something like Hoss' facet.range.buckets. But if you for some reason wish to extend the "gap" parameter, I guess it needs to remain relative gaps since that is kind of implied in the wording?
          Hide
          Ted Sullivan added a comment -

          Right. I'm following with Shalin Shekhar Mangar suggestion to split out your/Hoss's facet.range.spec / facet.sequence idea as a separate issue. I don't think of this as extending the gap parameter - I am just providing more explicit information in the response as to what gaps you actually get (as per your suggestion of Sept/2011) - similar to what you would get if you implemented this using facet.query. Looking at the current code, it is pretty easy to add the range information to the response (right now the response labels are just the gap starts). This may be user-unfriendly as you say, but I would argue that it is more friendly than what we have right now - it is certainly more developer-friendly because it provides better feedback. There is a lot of interest in this feature (it has been advertised on the SimpleFacetsParameter Wiki for some time now) as evidenced by earlier comments in this thread. My original desire was just to make (the patch) usable for those that want to use it by upgrading Grant's original patch so that it would work with the new modular class organization. The work required to spiff up the facet.range.gap response is not large. I haven't impacted the facet.range.spec/buckets approach but that would seem to require more effort.

          Show
          Ted Sullivan added a comment - Right. I'm following with Shalin Shekhar Mangar suggestion to split out your/Hoss's facet.range.spec / facet.sequence idea as a separate issue. I don't think of this as extending the gap parameter - I am just providing more explicit information in the response as to what gaps you actually get (as per your suggestion of Sept/2011) - similar to what you would get if you implemented this using facet.query. Looking at the current code, it is pretty easy to add the range information to the response (right now the response labels are just the gap starts). This may be user-unfriendly as you say, but I would argue that it is more friendly than what we have right now - it is certainly more developer-friendly because it provides better feedback. There is a lot of interest in this feature (it has been advertised on the SimpleFacetsParameter Wiki for some time now) as evidenced by earlier comments in this thread. My original desire was just to make (the patch) usable for those that want to use it by upgrading Grant's original patch so that it would work with the new modular class organization. The work required to spiff up the facet.range.gap response is not large. I haven't impacted the facet.range.spec/buckets approach but that would seem to require more effort.
          Hide
          Grant Ingersoll added a comment -

          +1 on splitting out and moving forward. FWIW, I think the gaps are user friendly, as I just think about what size should my gaps be. Since no one has stepped up on the other capabilities, I would suggest we move forward on what we have working now.

          Show
          Grant Ingersoll added a comment - +1 on splitting out and moving forward. FWIW, I think the gaps are user friendly, as I just think about what size should my gaps be. Since no one has stepped up on the other capabilities, I would suggest we move forward on what we have working now.
          Hide
          Uwe Schindler added a comment -

          Move issue to Solr 4.9.

          Show
          Uwe Schindler added a comment - Move issue to Solr 4.9.

            People

            • Assignee:
              Shalin Shekhar Mangar
              Reporter:
              Grant Ingersoll
            • Votes:
              28 Vote for this issue
              Watchers:
              29 Start watching this issue

              Dates

              • Created:
                Updated:

                Development