Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.3
    • Component/s: None
    • Labels:
      None

      Description

      1) Allow clients to express concepts like...

      • "give me facet counts per day for every day this month."
      • "give me facet counts per hour for every hour of today."
      • "give me facet counts per hour for every hour of a specific day."
      • "give me facet counts per hour for every hour of a specific day and give me facet counts for the
        number of matches before that day, or after that day."
        2) Return all data in a way that makes it easy to use to build filter queries on those date ranges.
      1. date_facets.patch
        21 kB
        Hoss Man
      2. date_facets.patch
        21 kB
        Hoss Man
      3. date_facets.patch
        18 kB
        Hoss Man
      4. date_facets.patch
        12 kB
        Hoss Man
      5. date_facets.patch
        12 kB
        Hoss Man
      6. date_facets.patch
        7 kB
        Hoss Man
      7. date_facets.patch
        7 kB
        Hoss Man

        Activity

        Hide
        Hoss Man added a comment -

        First pass, no tests but the basics work ... i'm not all that happy about the cleanliness of the code yet (particularly the back and forth of format conversions).

        Params...

        • facet.date = FIELD_NAME ... multivalued
        • (f. FIELD_NAME.)facet.date.start = DATE ... single value, per field overridable, date that supports "NOW" style date math strings
        • (f. FIELD_NAME.)facet.date.end = DATE ... single value, per field overridable, date that supports "NOW" style date math strings
        • (f. FIELD_NAME.)facet.date.gap = DATE_MATH_STR ... single value, per field overridable, date math string (ie: "+1DAY")
        • (f. FIELD_NAME.)facet.date.other = pre | post | inner | all ... multivalue, per field overridable, string indicating what "other" info we want about the range:
          o pre = the count of matches before the start date
          o post = the count of matches after the end date
          o inner = the count of all matches between start and end
          o all = all of the above (default value)
          o none = no additional info requested.

        Still Todo...

        1) add the support for facet.date.other (simple to do, just ran out of time to day)
        2) Date parsing needs to be enhanced to support date math on full date strings, not just "NOW" that way if a client knows they are using facet.date.gap of +1DAY and they get back a facet count for 1995-12-31T00:00:00.000Z they can easily generate a filter query for field:[1995-12-31T00:00:00.000Z TO 1995-12-31T00:00:00.000Z+1DAY] to restrict their results
        3) rethink some of the "ft.toExternal(ft.toInternal(...))" type stuff going on to ensure any date math strings are parsed... some of this may make sense as lower level methods in the DateField class

        Questions I'm not sure about...

        • how much should we worry about gaps not dividing evenly between start/end ... right now every range is exactly "gap" wide ... even if it goes past the "end". Should the last gap end at "end" no matter what? (how would we return that info in a way that's easy to parse and make a filter query out of)
        • ranges currently include both end poinst ... Ideally we'd include one end but not the other (so no overlap) but that makes filterqueries to restrict by those rnages hard (even though queryparser supports inclusive or exclusive ranges it doesn't support a mix/match of inclusive on one side and exclusive on the other). maybe we can have an "interval" param which defaults to one millisecond so ranges can allways be inclusive and still not overlap?
        • what should happen if "end < start" or "gap < 0" ... maybe those should be okay as long as both are true.
        • should we support hardcoded default values for start, end, and gap? ... for start it's easy to get the lowest value in the field, but what about gap and end?
        Show
        Hoss Man added a comment - First pass, no tests but the basics work ... i'm not all that happy about the cleanliness of the code yet (particularly the back and forth of format conversions). Params... facet.date = FIELD_NAME ... multivalued (f. FIELD_NAME.)facet.date.start = DATE ... single value, per field overridable, date that supports "NOW" style date math strings (f. FIELD_NAME.)facet.date.end = DATE ... single value, per field overridable, date that supports "NOW" style date math strings (f. FIELD_NAME.)facet.date.gap = DATE_MATH_STR ... single value, per field overridable, date math string (ie: "+1DAY") (f. FIELD_NAME.)facet.date.other = pre | post | inner | all ... multivalue, per field overridable, string indicating what "other" info we want about the range: o pre = the count of matches before the start date o post = the count of matches after the end date o inner = the count of all matches between start and end o all = all of the above (default value) o none = no additional info requested. Still Todo... 1) add the support for facet.date.other (simple to do, just ran out of time to day) 2) Date parsing needs to be enhanced to support date math on full date strings, not just "NOW" that way if a client knows they are using facet.date.gap of +1DAY and they get back a facet count for 1995-12-31T00:00:00.000Z they can easily generate a filter query for field: [1995-12-31T00:00:00.000Z TO 1995-12-31T00:00:00.000Z+1DAY] to restrict their results 3) rethink some of the "ft.toExternal(ft.toInternal(...))" type stuff going on to ensure any date math strings are parsed... some of this may make sense as lower level methods in the DateField class Questions I'm not sure about... how much should we worry about gaps not dividing evenly between start/end ... right now every range is exactly "gap" wide ... even if it goes past the "end". Should the last gap end at "end" no matter what? (how would we return that info in a way that's easy to parse and make a filter query out of) ranges currently include both end poinst ... Ideally we'd include one end but not the other (so no overlap) but that makes filterqueries to restrict by those rnages hard (even though queryparser supports inclusive or exclusive ranges it doesn't support a mix/match of inclusive on one side and exclusive on the other). maybe we can have an "interval" param which defaults to one millisecond so ranges can allways be inclusive and still not overlap? what should happen if "end < start" or "gap < 0" ... maybe those should be okay as long as both are true. should we support hardcoded default values for start, end, and gap? ... for start it's easy to get the lowest value in the field, but what about gap and end?
        Hide
        Hoss Man added a comment -

        no functionality changes, just reivsed to work with current trunk (547393)

        Show
        Hoss Man added a comment - no functionality changes, just reivsed to work with current trunk (547393)
        Hide
        Hoss Man added a comment -

        revised patch with added functionality...

        1) facet.date.other now supported
        2) the value of "gap" always explicitly returned for each field so all of the dates used as keys can be made into filter queries because...
        3) DateField enhanced to support DateMath parsing of arbitrary dates (ie: 1995-12-31T23:59:59.999Z+5MINUTES)

        still haven't answered any of hte open questions before, nor have i cleaned up the usage of things like ft.toExternal(ft.toInternal(...)) in SimpleFacets

        (it still bugs me, but i'd rather not refactor until i have some test cases)

        Show
        Hoss Man added a comment - revised patch with added functionality... 1) facet.date.other now supported 2) the value of "gap" always explicitly returned for each field so all of the dates used as keys can be made into filter queries because... 3) DateField enhanced to support DateMath parsing of arbitrary dates (ie: 1995-12-31T23:59:59.999Z+5MINUTES) still haven't answered any of hte open questions before, nor have i cleaned up the usage of things like ft.toExternal(ft.toInternal(...)) in SimpleFacets (it still bugs me, but i'd rather not refactor until i have some test cases)
        Hide
        Hoss Man added a comment -

        no functional changes:

        • updated to work against trunk
        • changed name of toExternal method to toObejct to be more consistent with recent trunk additions.
        Show
        Hoss Man added a comment - no functional changes: updated to work against trunk changed name of toExternal method to toObejct to be more consistent with recent trunk additions.
        Hide
        Hoss Man added a comment -

        patch now includes unit tests, as well as a bug fix i discovered for the pre/inner/post logic after writing the test.

        Show
        Hoss Man added a comment - patch now includes unit tests, as well as a bug fix i discovered for the pre/inner/post logic after writing the test.
        Hide
        Hoss Man added a comment -

        i'd like to commit this in the next few days baring any objections.

        in particular, feedback on the API (ie: query params) would be good ... the internals can always be cleaned up later if people don't like them, but the query args should be sanity checked before people start using them.

        Show
        Hoss Man added a comment - i'd like to commit this in the next few days baring any objections. in particular, feedback on the API (ie: query params) would be good ... the internals can always be cleaned up later if people don't like them, but the query args should be sanity checked before people start using them.
        Hide
        Ryan McKinley added a comment -

        This looks great Hoss. Thanks!

        The facet param interface look reasonable. Structurally, I would like to see 'component' based params split into their own file - FacetParams should be similar to HighlightParams. It seems funny to munge the get/set field bit with the expanding list of things we may get or set. If we implement FacetParams as an interface (like HighlightParams), the deprecated class o.a.s.request.SolrParams could implement FacetParams.

        One thing to note on FacetDateOther.get( string ), if you put in an invalid string, you will get IllegalArgumentException or NullPointer - not a 400 response code. Perhaps somethign like:

        public enum FacetDateOther {
        PRE, POST, INNER, ALL, NONE;
        public String toString()

        { return super.toString().toLowerCase(); }

        public static FacetDateOther get(String label) {
        try

        { return valueOf(label.toUpperCase()); }

        catch( Exception ex )

        { throw new SolrException (SolrException.ErrorCode.BAD_REQUEST, label +" is not a valid type of 'other' date facet information", ex ); }

        }
        }

        Personally, I like the sound of "before", "after" and "between" better then "pre" "post" "inner". before/after seem to sit nicely with the other parameters 'start' 'end'.

        Show
        Ryan McKinley added a comment - This looks great Hoss. Thanks! The facet param interface look reasonable. Structurally, I would like to see 'component' based params split into their own file - FacetParams should be similar to HighlightParams. It seems funny to munge the get/set field bit with the expanding list of things we may get or set. If we implement FacetParams as an interface (like HighlightParams), the deprecated class o.a.s.request.SolrParams could implement FacetParams. One thing to note on FacetDateOther.get( string ), if you put in an invalid string, you will get IllegalArgumentException or NullPointer - not a 400 response code. Perhaps somethign like: public enum FacetDateOther { PRE, POST, INNER, ALL, NONE; public String toString() { return super.toString().toLowerCase(); } public static FacetDateOther get(String label) { try { return valueOf(label.toUpperCase()); } catch( Exception ex ) { throw new SolrException (SolrException.ErrorCode.BAD_REQUEST, label +" is not a valid type of 'other' date facet information", ex ); } } } Personally, I like the sound of "before", "after" and "between" better then "pre" "post" "inner". before/after seem to sit nicely with the other parameters 'start' 'end'.
        Hide
        Pieter Berkel added a comment -

        I've just tried this patch and the results are impressive!

        I agree with Ryan regarding the naming of 'pre', 'post' and 'inner', using simple concrete words will make it easier for developers to understand the basic concepts. At first I was a little confused how the 'gap' parameter was used, perhaps a name like 'interval' would be more indicative of it's purpose?

        While on the topic of gaps / intervals, I can imagine a case where one might want facet counts over non-linear intervals, for instance obtaining results from: "Last 7 days", "Last 30 days", "Last 90 days", "Last 6 months". Obviously you can achieve this by setting facet.date.gap=+1DAY and then post-process the results, but a much more elegant solution would be to allow "facet.date.gap" (or another suitably named param) to accept a (comma-delimited) set of explicit partition dates:

        facet.date.start=NOW-6MONTHS/DAY
        facet.date.end=NOW/DAY
        facet.date.gap=NOW-90DAYS/DAY,NOW-30DAYS/DAY,NOW-7DAYS/DAY

        It would then be trivial to calculate facet counts for the ranges specified above.

        It would be useful to make the 'start' an 'end' parameters optional. If not specified 'start' should default to the earliest stored date value, and 'end' should default to the latest stored date value (assuming that's possible). Probably should return a 400 if 'gap' is not set.

        My personal opinion is that 'end' should be a hard limit, the last gap should never go past 'end'. Given that the facet label is always generated from the lower value in the range, I don't think truncating the last 'gap' will cause problems, however it may be helpful to return the actual date value for "end" if it was specified as a offset of NOW.

        What might be a problem is when both start and end dates are specified as offsets of NOW, the value of NOW may not be constant for both values. In one of my tests, I set:

        facet.date.start=NOW-12MONTHS
        facet.date.end=NOW
        facet.date.gap=+1MONTH

        With some extra debugging output I can see that mostly the value of NOW is the same:

        <str name="start">2006-07-13T06:06:07.397</str>
        <str name="end">2007-07-13T06:06:07.397</str>

        However occasionally there is a difference:

        <str name="start">2006-07-13T05:48:23.014</str>
        <str name="end">2007-07-13T05:48:23.015</str>

        This difference alters the number of gaps calculated (+1 when NOW values are diff for start & end). Not sure how this could be fixed, but as you mentioned above, it will probably involve changing "ft.toExternal(ft.toInternal(...))".

        Thanks again for creating this useful addition, I'll try to test it a bit more and see if I can find anything else.

        Show
        Pieter Berkel added a comment - I've just tried this patch and the results are impressive! I agree with Ryan regarding the naming of 'pre', 'post' and 'inner', using simple concrete words will make it easier for developers to understand the basic concepts. At first I was a little confused how the 'gap' parameter was used, perhaps a name like 'interval' would be more indicative of it's purpose? While on the topic of gaps / intervals, I can imagine a case where one might want facet counts over non-linear intervals, for instance obtaining results from: "Last 7 days", "Last 30 days", "Last 90 days", "Last 6 months". Obviously you can achieve this by setting facet.date.gap=+1DAY and then post-process the results, but a much more elegant solution would be to allow "facet.date.gap" (or another suitably named param) to accept a (comma-delimited) set of explicit partition dates: facet.date.start=NOW-6MONTHS/DAY facet.date.end=NOW/DAY facet.date.gap=NOW-90DAYS/DAY,NOW-30DAYS/DAY,NOW-7DAYS/DAY It would then be trivial to calculate facet counts for the ranges specified above. It would be useful to make the 'start' an 'end' parameters optional. If not specified 'start' should default to the earliest stored date value, and 'end' should default to the latest stored date value (assuming that's possible). Probably should return a 400 if 'gap' is not set. My personal opinion is that 'end' should be a hard limit, the last gap should never go past 'end'. Given that the facet label is always generated from the lower value in the range, I don't think truncating the last 'gap' will cause problems, however it may be helpful to return the actual date value for "end" if it was specified as a offset of NOW. What might be a problem is when both start and end dates are specified as offsets of NOW, the value of NOW may not be constant for both values. In one of my tests, I set: facet.date.start=NOW-12MONTHS facet.date.end=NOW facet.date.gap=+1MONTH With some extra debugging output I can see that mostly the value of NOW is the same: <str name="start">2006-07-13T06:06:07.397</str> <str name="end">2007-07-13T06:06:07.397</str> However occasionally there is a difference: <str name="start">2006-07-13T05:48:23.014</str> <str name="end">2007-07-13T05:48:23.015</str> This difference alters the number of gaps calculated (+1 when NOW values are diff for start & end). Not sure how this could be fixed, but as you mentioned above, it will probably involve changing "ft.toExternal(ft.toInternal(...))". Thanks again for creating this useful addition, I'll try to test it a bit more and see if I can find anything else.
        Hide
        Hoss Man added a comment -

        1) i'm happy to break out the FacetParams into their own interface ... but i'd like to track that in a separate refactoring commit (since the existing facet params are already in SolrParams)

        2) i clearly anticipated the FacetDateOther.get( bogus ) problem .. but for some reason i thought it returned null ... i'll fix that.

        3) i actually considered before, between, and after originally but decided they were too long (i was trying to find a way to make "start" shorter as well ... but two people thinking there better convinces me.

        4) my hesitation about renaming "gap" to "interval" is that i wanted to leave the door open for a sperate "interval" option (to define a "gap between the gaps" so to speak) later should it be desired ... see the "questions" i listed when opening the bug.

        5) i don't think this code makes sense for non-linear intervals ... the problem i'm really trying to solve here is using 3 params to express equal date divisions across an arbitrarily long time scale. for the example you listed simple facet.query options probably make more sense

        (allthough you do have me now thinking that a another good faceting option would be some new "facet.range" where many values can be specified, they all get sorted and then ranges are built between each successive value ... bt that should be a seperate issue)

        6) i want to make start and end optional, but for now i can't think of a clean/fast way to do end ... and we can always add defaults later.

        7) my prefrence is for every count to cover a range of exactly "gap" but i can definitely see where having a hard cutoff of "end" is usefull, so i'll make it an option ... name suggestions?

        i'll make sure to echo the value of "end" as well so it's easy to build filter queries for that last range ... probably should have it anyway to build filter queries on between and after.

        should the ranges used to compute the between and after counts depend on where the last range ended or on the literal "end" param?

        8) the NOW variance really bugs me ... back when i built DateMathParser i anticipated this by making the parser have a fixed concept of NOW which could be used to parse multiple strings but i don't kow why i didn't consider it when working on this new patch.
        the real problem is that right now DateField is relied on to do all hte parsing, and a single instance can't have a fixed notion of "NOW" ... it builds a new DateMathParser each time ... i think i'm going ot have to do some heavily refactoring to fix this, which is annoying – but i don't want to commit without fixing this, even if it takes a while any bug that can produce an "off by 1 millisecond" discrepancy should die a horrible horrible freaking death.

        Show
        Hoss Man added a comment - 1) i'm happy to break out the FacetParams into their own interface ... but i'd like to track that in a separate refactoring commit (since the existing facet params are already in SolrParams) 2) i clearly anticipated the FacetDateOther.get( bogus ) problem .. but for some reason i thought it returned null ... i'll fix that. 3) i actually considered before, between, and after originally but decided they were too long (i was trying to find a way to make "start" shorter as well ... but two people thinking there better convinces me. 4) my hesitation about renaming "gap" to "interval" is that i wanted to leave the door open for a sperate "interval" option (to define a "gap between the gaps" so to speak) later should it be desired ... see the "questions" i listed when opening the bug. 5) i don't think this code makes sense for non-linear intervals ... the problem i'm really trying to solve here is using 3 params to express equal date divisions across an arbitrarily long time scale. for the example you listed simple facet.query options probably make more sense (allthough you do have me now thinking that a another good faceting option would be some new "facet.range" where many values can be specified, they all get sorted and then ranges are built between each successive value ... bt that should be a seperate issue) 6) i want to make start and end optional, but for now i can't think of a clean/fast way to do end ... and we can always add defaults later. 7) my prefrence is for every count to cover a range of exactly "gap" but i can definitely see where having a hard cutoff of "end" is usefull, so i'll make it an option ... name suggestions? i'll make sure to echo the value of "end" as well so it's easy to build filter queries for that last range ... probably should have it anyway to build filter queries on between and after. should the ranges used to compute the between and after counts depend on where the last range ended or on the literal "end" param? 8) the NOW variance really bugs me ... back when i built DateMathParser i anticipated this by making the parser have a fixed concept of NOW which could be used to parse multiple strings but i don't kow why i didn't consider it when working on this new patch. the real problem is that right now DateField is relied on to do all hte parsing, and a single instance can't have a fixed notion of "NOW" ... it builds a new DateMathParser each time ... i think i'm going ot have to do some heavily refactoring to fix this, which is annoying – but i don't want to commit without fixing this, even if it takes a while any bug that can produce an "off by 1 millisecond" discrepancy should die a horrible horrible freaking death.
        Hide
        Ryan McKinley added a comment -

        >
        > but i'd like to track that in a separate refactoring commit (since the existing facet params are already in SolrParams)

        sounds good.

        > ... originally but decided they were too long ..

        In general, I favor longer self explanatory param names over short ones. It is kind of annoying to have to look up 'pf', 'bq' to decode what it means.

        • - -

        Again, this is really great. Now we can build the ubiquitous calendar widget from solr.

        Thanks!

        Show
        Ryan McKinley added a comment - > > but i'd like to track that in a separate refactoring commit (since the existing facet params are already in SolrParams) sounds good. > ... originally but decided they were too long .. In general, I favor longer self explanatory param names over short ones. It is kind of annoying to have to look up 'pf', 'bq' to decode what it means. - - Again, this is really great. Now we can build the ubiquitous calendar widget from solr. Thanks!
        Hide
        Yonik Seeley added a comment -

        wrt "NOW", DateMathParser constructor could pass in "now", and take it from SolrQueryRequest.getStartTime()... the big problem being that I doubt the SolrQueryRequest is always available everywhere it's needed.

        Of course, if you had the request, you could just use it's context to stash a DateMathParser too.

        a ThreadLocal would be another (much less desirable) approach.

        Show
        Yonik Seeley added a comment - wrt "NOW", DateMathParser constructor could pass in "now", and take it from SolrQueryRequest.getStartTime()... the big problem being that I doubt the SolrQueryRequest is always available everywhere it's needed. Of course, if you had the request, you could just use it's context to stash a DateMathParser too. a ThreadLocal would be another (much less desirable) approach.
        Hide
        Hoss Man added a comment -

        > the big problem being that I doubt the SolrQueryRequest is always available everywhere it's needed.

        ...exactly, at the moment all of the date parsing is done inside DateField.

        i think i'll try refactoring it so that DateMathParser does all the parsing, and make DateField delegate to it in the non-trivial case.

        the problem that's still a pain to solve is getting all concepts of "NOW" to be the samefor a request ... things like an fq=f:[NOW * NOW+1DAY] are handled by DateField via a query parser ... i can't think of easy way to make that consistent with the facet parsing definition of "NOW" (without resorting to a ThreadLocal)

        Show
        Hoss Man added a comment - > the big problem being that I doubt the SolrQueryRequest is always available everywhere it's needed. ...exactly, at the moment all of the date parsing is done inside DateField. i think i'll try refactoring it so that DateMathParser does all the parsing, and make DateField delegate to it in the non-trivial case. the problem that's still a pain to solve is getting all concepts of "NOW" to be the samefor a request ... things like an fq=f: [NOW * NOW+1DAY] are handled by DateField via a query parser ... i can't think of easy way to make that consistent with the facet parsing definition of "NOW" (without resorting to a ThreadLocal)
        Hide
        Tristan Vittorio added a comment -

        > 4) my hesitation about renaming "gap" to "interval" is that i wanted to leave the door open
        > for a sperate "interval" option (to define a "gap between the gaps" so to speak) later
        > should it be desired ... see the "questions" i listed when opening the bug.

        I do like the idea of having an "interval" between "gaps", however to me it would make more sense to reverse the meanings of these parameters to have "gaps" between "intervals". Regardless, as long as it's clearly documented, it shouldn't make any difference what you name them.

        > 5) i don't think this code makes sense for non-linear intervals ...

        It might be better to keep the logic simple and as-is for now so you can commit it. Having a "facet.range" parameter or some way to specify multiple date facets on a single field would be useful in the future.

        > 7) my prefrence is for every count to cover a range of exactly "gap" but i can definitely see where
        > having a hard cutoff of "end" is usefull, so i'll make it an option ... name suggestions?

        Just thinking through this further, rather than specifying both start and end times, it might be more precise to specify a single start time, a gap, and a gap "count" (how many "gaps" to include), this will avoid the problem of the last "gap" going past the "end" date.

        I find it much easier to criticize other people's naming conventions than to come up with good ones myself, however I'll offer "hardend" (true | false) as an interim name, hopefully someone can think of a better one.

        > i'll make sure to echo the value of "end" as well so it's easy to build filter queries for that last range ...
        > probably should have it anyway to build filter queries on between and after.

        It might be helpful to output the value of "start" also, especially if it was specified as an offset of NOW.

        > should the ranges used to compute the between and after counts depend on where the last range ended or on the literal "end" param?

        I suppose this will depend on the value of "hardend", if true then use the "end" value, otherwise use the end of the last gap.

        > 8) the NOW variance really bugs me ...

        Sounds like a pretty nasty problem affecting more than just this date facet. I know Solr is not a RDBMS, but I always assumed that NOW would be constant throughout the life of a query. Definitely something to think about as a seperate issue though.

        Show
        Tristan Vittorio added a comment - > 4) my hesitation about renaming "gap" to "interval" is that i wanted to leave the door open > for a sperate "interval" option (to define a "gap between the gaps" so to speak) later > should it be desired ... see the "questions" i listed when opening the bug. I do like the idea of having an "interval" between "gaps", however to me it would make more sense to reverse the meanings of these parameters to have "gaps" between "intervals". Regardless, as long as it's clearly documented, it shouldn't make any difference what you name them. > 5) i don't think this code makes sense for non-linear intervals ... It might be better to keep the logic simple and as-is for now so you can commit it. Having a "facet.range" parameter or some way to specify multiple date facets on a single field would be useful in the future. > 7) my prefrence is for every count to cover a range of exactly "gap" but i can definitely see where > having a hard cutoff of "end" is usefull, so i'll make it an option ... name suggestions? Just thinking through this further, rather than specifying both start and end times, it might be more precise to specify a single start time, a gap, and a gap "count" (how many "gaps" to include), this will avoid the problem of the last "gap" going past the "end" date. I find it much easier to criticize other people's naming conventions than to come up with good ones myself, however I'll offer "hardend" (true | false) as an interim name, hopefully someone can think of a better one. > i'll make sure to echo the value of "end" as well so it's easy to build filter queries for that last range ... > probably should have it anyway to build filter queries on between and after. It might be helpful to output the value of "start" also, especially if it was specified as an offset of NOW. > should the ranges used to compute the between and after counts depend on where the last range ended or on the literal "end" param? I suppose this will depend on the value of "hardend", if true then use the "end" value, otherwise use the end of the last gap. > 8) the NOW variance really bugs me ... Sounds like a pretty nasty problem affecting more than just this date facet. I know Solr is not a RDBMS, but I always assumed that NOW would be constant throughout the life of a query. Definitely something to think about as a seperate issue though.
        Hide
        Hoss Man added a comment -

        > it might be more precise to specify a single start time, a gap, and a gap "count"
        > (how many "gaps" to include), this will avoid the problem of the last "gap"
        > going past the "end" date.

        that would eliminate the DateMathParser value add for ... right now you can hardcode "start=NOW/MONTH&end=NOW/MONTH+1MONTH&gap=+1DAY" and get counts per day for the current month – no matter how many days are in the current month ... if we changed it so a param said how many counts to compute they couldn't be specified in the solrconfig and the client would have to be a lot smarter (and might as well use explicit date params since it has to know the current month to know the number of days)

        > It might be helpful to output the value of "start" also,
        > especially if it was specified as an offset of NOW.

        that's already in the output ... each count is labeled by the lower bound of it's range, so the label of the first count is the start .. but i guess there's no harm in being explicit about it.

        Show
        Hoss Man added a comment - > it might be more precise to specify a single start time, a gap, and a gap "count" > (how many "gaps" to include), this will avoid the problem of the last "gap" > going past the "end" date. that would eliminate the DateMathParser value add for ... right now you can hardcode "start=NOW/MONTH&end=NOW/MONTH+1MONTH&gap=+1DAY" and get counts per day for the current month – no matter how many days are in the current month ... if we changed it so a param said how many counts to compute they couldn't be specified in the solrconfig and the client would have to be a lot smarter (and might as well use explicit date params since it has to know the current month to know the number of days) > It might be helpful to output the value of "start" also, > especially if it was specified as an offset of NOW. that's already in the output ... each count is labeled by the lower bound of it's range, so the label of the first count is the start .. but i guess there's no harm in being explicit about it.
        Hide
        Pieter Berkel added a comment - - edited

        Sorry that last comment was from me (not Tristan), not posted from my regular computer. I'll be more careful to post as myself and not as a colleague in future (I was wondering why JIRA didn't ask me to login, d'oh).

        Show
        Pieter Berkel added a comment - - edited Sorry that last comment was from me (not Tristan), not posted from my regular computer. I'll be more careful to post as myself and not as a colleague in future (I was wondering why JIRA didn't ask me to login, d'oh).
        Hide
        Hoss Man added a comment -

        checkpoint...

        • renamed pre/post/inner to before/after/between
        • added a new facet.date.hardend param (with test additions)

        ...still need to tackle the "NOW" inconsistency issue.

        Show
        Hoss Man added a comment - checkpoint... renamed pre/post/inner to before/after/between added a new facet.date.hardend param (with test additions) ...still need to tackle the "NOW" inconsistency issue.
        Hide
        Hoss Man added a comment -

        fixed the the NOW issue by refactoring the toExternal(toInternal()) logic into a new DateField.parseMath(Date,String) method ... a DateMathParser is still used internally to deal with teh math parsing aspects, but i wanted to leave the assumptions about the date format in the DateField class itself.

        comments/critique about this approach welcome.

        Show
        Hoss Man added a comment - fixed the the NOW issue by refactoring the toExternal(toInternal()) logic into a new DateField.parseMath(Date,String) method ... a DateMathParser is still used internally to deal with teh math parsing aspects, but i wanted to leave the assumptions about the date format in the DateField class itself. comments/critique about this approach welcome.
        Hide
        Pieter Berkel added a comment -

        Looking good Hoss, the NOW issue seems to be resolved and the results look consistent after a quick test.

        > * what should happen if "end < start" or "gap < 0" ... maybe those should be okay as long as both are true.

        It is probably wise to explicitly check for ("end < start" XOR "gap < 0") and return an error if so, otherwise the request gets caught in an infinite loop.

        Just on the subject of errors, I notice that exceptions thrown by the date facet code are caught in SimpleFacets.getFacetCounts() and written out in the response:

        try

        { res.add("facet_queries", getFacetQueryCounts()); res.add("facet_fields", getFacetFieldCounts()); res.add("facet_dates", getFacetDateCounts()); }

        catch (Exception e)

        { SolrException.logOnce(SolrCore.log, "Exception during facet counts", e); res.add("exception", SolrException.toStr(e)); }

        This doesn't seem very consistent the way other handlers deal with exceptions (i.e. http response code > 400), is there any reason why it is done this way in SimpleFacets?

        I also think it would also be a good idea to merge "facet_dates" response field into "facet_fields" so that all the facet data in the response is stored in the one location, how feasible would it be to do this?

        Show
        Pieter Berkel added a comment - Looking good Hoss, the NOW issue seems to be resolved and the results look consistent after a quick test. > * what should happen if "end < start" or "gap < 0" ... maybe those should be okay as long as both are true. It is probably wise to explicitly check for ("end < start" XOR "gap < 0") and return an error if so, otherwise the request gets caught in an infinite loop. Just on the subject of errors, I notice that exceptions thrown by the date facet code are caught in SimpleFacets.getFacetCounts() and written out in the response: try { res.add("facet_queries", getFacetQueryCounts()); res.add("facet_fields", getFacetFieldCounts()); res.add("facet_dates", getFacetDateCounts()); } catch (Exception e) { SolrException.logOnce(SolrCore.log, "Exception during facet counts", e); res.add("exception", SolrException.toStr(e)); } This doesn't seem very consistent the way other handlers deal with exceptions (i.e. http response code > 400), is there any reason why it is done this way in SimpleFacets? I also think it would also be a good idea to merge "facet_dates" response field into "facet_fields" so that all the facet data in the response is stored in the one location, how feasible would it be to do this?
        Hide
        Hoss Man added a comment -

        > It is probably wise to explicitly check for ("end < start" XOR "gap < 0") and return an error

        yeah ... good point.

        > Just on the subject of errors, I notice that exceptions thrown by the date facet code are
        > caught in SimpleFacets.getFacetCounts() and written out in the response:
        ...
        > This doesn't seem very consistent the way other handlers deal with exceptions (i.e. http
        > response code > 400), is there any reason why it is done this way in SimpleFacets?

        SimpleFacets isn't a handler, it's just a utility class that other handlers can use.

        the original idea behind catching the errors and adding them to the response is that even if a problem happens while generating facet counts, that's just auxiliary data – and the main result set is (probably) still useful, so let the request finish successfully so the client can decide what to do.

        > I also think it would also be a good idea to merge "facet_dates" response field into
        > "facet_fields" so that all the facet data in the response is stored in the one location, how
        > feasible would it be to do this?

        facet_dates and facet_fields are both children of single parent (facet_counts) just like facet_queries ... but they are in their own sub sections because the meaning and usecase are different ... if they were all lumped together you couldn't pragmatically know what each of the children were.

        Show
        Hoss Man added a comment - > It is probably wise to explicitly check for ("end < start" XOR "gap < 0") and return an error yeah ... good point. > Just on the subject of errors, I notice that exceptions thrown by the date facet code are > caught in SimpleFacets.getFacetCounts() and written out in the response: ... > This doesn't seem very consistent the way other handlers deal with exceptions (i.e. http > response code > 400), is there any reason why it is done this way in SimpleFacets? SimpleFacets isn't a handler, it's just a utility class that other handlers can use. the original idea behind catching the errors and adding them to the response is that even if a problem happens while generating facet counts, that's just auxiliary data – and the main result set is (probably) still useful, so let the request finish successfully so the client can decide what to do. > I also think it would also be a good idea to merge "facet_dates" response field into > "facet_fields" so that all the facet data in the response is stored in the one location, how > feasible would it be to do this? facet_dates and facet_fields are both children of single parent (facet_counts) just like facet_queries ... but they are in their own sub sections because the meaning and usecase are different ... if they were all lumped together you couldn't pragmatically know what each of the children were.
        Hide
        Hoss Man added a comment -

        strange i could have sworn i resolved this issue ... commited over a month ago.

        Show
        Hoss Man added a comment - strange i could have sworn i resolved this issue ... commited over a month ago.
        Hide
        Hoss Man added a comment -

        This bug was modified as part of a bulk update using the criteria...

        • Marked "Resolved" and "Fixed"
        • Had no "Fix Version" versions
        • Was listed in the CHANGES.txt for 1.3 as of today 2008-03-15

        The Fix Version for all 29 issues found was set to 1.3, email notification was suppressed to prevent excessive email.

        For a list of all the issues modified, search jira comments for this (hopefully) unique string: batch20070315hossman1

        Show
        Hoss Man added a comment - This bug was modified as part of a bulk update using the criteria... Marked "Resolved" and "Fixed" Had no "Fix Version" versions Was listed in the CHANGES.txt for 1.3 as of today 2008-03-15 The Fix Version for all 29 issues found was set to 1.3, email notification was suppressed to prevent excessive email. For a list of all the issues modified, search jira comments for this (hopefully) unique string: batch20070315hossman1

          People

          • Assignee:
            Hoss Man
            Reporter:
            Hoss Man
          • Votes:
            1 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development