Solr
  1. Solr
  2. SOLR-2690

Date Math should allow clients to override timezone used for rounding (faceting & queries)

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 4.0-ALPHA
    • Component/s: None
    • Labels:
      None

      Description

      Timezone needs to be taken into account when doing date math. Currently it isn't. DateMathParser instances created are always being constructed with UTC. This is a huge issue when it comes to faceting. Depending on your timezone day-light-savings changes the length of a month. A facet gap of +1MONTH is different depending on the timezone and the time of the year.

      I believe the issue is very simple to fix. There are three places in the code DateMathParser is created. All three are configured with the timezone being UTC. If a user could specify the TimeZone to pass into DateMathParser this faceting issue would be resolved.

      1. add-tz-parameter.patch
        10 kB
        David Schlotfeldt
      2. add-tz-parameter.patch
        10 kB
        David Schlotfeldt
      3. SOLR-2690.patch
        25 kB
        Hoss Man
      4. SOLR-2690.patch
        16 kB
        Hoss Man
      5. SOLR-2690.patch
        7 kB
        Hoss Man
      6. timezone-facet-component.tgz
        13 kB
        Shotaro Kamio

        Issue Links

          Activity

          Hide
          Yonik Seeley added a comment -

          Although this probably isn't a "bug", I agree that handling timezones somehow would be nice.
          We just need to think very carefully about the API so we can support it long term.

          One immediate thought I had was that it would be a pain to specify the timezone everywhere. Even a simple range query would need to specify it twice:
          my_date:["(?timeZone=America/Chicago)NOW/YEAR" TO "(?timeZone=America/Chicago)+1MONTH"]

          So one possible alternative that needs more thought is a "TZ" request parameter that would apply by default to things that are date related.

          Show
          Yonik Seeley added a comment - Although this probably isn't a "bug", I agree that handling timezones somehow would be nice. We just need to think very carefully about the API so we can support it long term. One immediate thought I had was that it would be a pain to specify the timezone everywhere. Even a simple range query would need to specify it twice: my_date: ["(?timeZone=America/Chicago)NOW/YEAR" TO "(?timeZone=America/Chicago)+1MONTH"] So one possible alternative that needs more thought is a "TZ" request parameter that would apply by default to things that are date related.
          Hide
          David Smiley added a comment -

          This issue is similar to SOLR-750 in which I pleaded with Hoss (and lost) to get the "Z" out of the DateMath date literal format since, at the end of the day, Solr really doesn't have time zone support – forcing me to "lie" to Solr about the time zone.

          Show
          David Smiley added a comment - This issue is similar to SOLR-750 in which I pleaded with Hoss (and lost) to get the "Z" out of the DateMath date literal format since, at the end of the day, Solr really doesn't have time zone support – forcing me to "lie" to Solr about the time zone.
          Hide
          David Schlotfeldt added a comment -

          Good point.

          Also, this isn't a bug but if we want a complete solution, we really need a way to specify times in other timezones.

          If I want midnight in Central time zone I shouldn't have to write: 2011-01-01T06:00:00Z
          (Note I wrote 6:00 not 0:00)
          I believe only DateField would have to be modified to make it possible to specify timezone.

          For a complete example if I wanted to facet blog posts by the date posted and the month:

          facet.date=blogPostDate
          facet.date.start=2011-01-01T00:00:00
          facet.date.end=2012-01-01T00:00:00
          facet.date.gap=+1MONTH
          timezone=America/Chicago

          Currently you would need to do the following. (Which actually gives close to correct results but not exact. Again, problem is the gap of +1MONTH doesn't take daylight savings into account so blog posts on the edge of ranges are counted in the wrong range.

          facet.date=blogPostDate
          facet.date.start=2011-01-01T00:06:00Z
          facet.date.end=2012-01-01T00:06:00Z
          facet.date.gap=+1MONTH

          Show
          David Schlotfeldt added a comment - Good point. Also, this isn't a bug but if we want a complete solution, we really need a way to specify times in other timezones. If I want midnight in Central time zone I shouldn't have to write: 2011-01-01T06:00:00Z (Note I wrote 6:00 not 0:00) I believe only DateField would have to be modified to make it possible to specify timezone. For a complete example if I wanted to facet blog posts by the date posted and the month: facet.date=blogPostDate facet.date.start=2011-01-01T00:00:00 facet.date.end=2012-01-01T00:00:00 facet.date.gap=+1MONTH timezone=America/Chicago Currently you would need to do the following. (Which actually gives close to correct results but not exact. Again, problem is the gap of +1MONTH doesn't take daylight savings into account so blog posts on the edge of ranges are counted in the wrong range. facet.date=blogPostDate facet.date.start=2011-01-01T00:06:00Z facet.date.end=2012-01-01T00:06:00Z facet.date.gap=+1MONTH
          Hide
          David Schlotfeldt added a comment -

          By extending FacetComponent (and having to resort to reflection) I added: facet.date.gap.tz

          The new parameter only affects the gap. The math done with processing the gap is the largest issue when it comes it date faceting in my mind.

          I would be more then happy to provide a patch to add this feature.

          No this doesn't address all timezone issues but at least it would address the main issue that makes date faceting, in my eyes, completely useless. I bet there are 100s of people out there using date faceting that don't realize it does NOT give correct results

          Show
          David Schlotfeldt added a comment - By extending FacetComponent (and having to resort to reflection) I added: facet.date.gap.tz The new parameter only affects the gap. The math done with processing the gap is the largest issue when it comes it date faceting in my mind. I would be more then happy to provide a patch to add this feature. No this doesn't address all timezone issues but at least it would address the main issue that makes date faceting, in my eyes, completely useless. I bet there are 100s of people out there using date faceting that don't realize it does NOT give correct results
          Hide
          David Schlotfeldt added a comment -

          Okay I've modified my code to now take "facet.date.tz" instead. The time zone now affects the facet's start, end and gap values.

          Show
          David Schlotfeldt added a comment - Okay I've modified my code to now take "facet.date.tz" instead. The time zone now affects the facet's start, end and gap values.
          Hide
          Hoss Man added a comment -

          If I want midnight in Central time zone I shouldn't have to write: 2011-01-01T06:00:00Z (Note I wrote 6:00 not 0:00)

          "Central time zone" is a vague concept that may mean one thing to you, but may mean something different to someone else. For any arbitrary moment in the (one dimensional) space of time values, there are an infinite number of ways to represent that time as a string (or as number) depending on where you place your origin for the coordinate system. Requiring that clients format times in UTC is no different then requiring clients to use Arabic numerals to represent integers – it's just a matter of making sure there is no ambiguity, and everyone is using the same definition of "0". UTC is a completely unambiguous coordinate system for times, that is guaranteed to work in any JVM that Solr might run on. Even if we added code to allow dates to be expressed in arbitrary user selected timezones, we couldn't make that garuntee.

          Bottom line: the issue of parsing/formatting times in other coordinate systems (ie: timezones) should not be convoluted with the issue of what timezone is used by the DateMathParser when rounding – those are distinct issues. It's completely conceivable to have a QParser that accepts a variety of data formats and "guesses" what TZ is meant and use that QParser in the same request where you want date faceting based on a TZ that is specified distinctly from the query string (ie: user's local TZ is UTC-0700, but they are searching for records dated before "Dec 15, 2010 4:20PM EST")

          So one possible alternative that needs more thought is a "TZ" request parameter that would apply by default to things that are date related.

          Right ... from the beginning DateMathparser was designed with the hope that a TZ/Locale pair could be specified per request (or per field declaration) for driving the rounding/math logic, there was just no sane way to specify an alternative to UTC/US that could be past down into the DateMathParser and used ubiquitously in a request because of the FieldType API.

          (Slight digression...

          its really only essential that we can affect DateMathParser the SimpleFacets uses when dealing with the gap of the date facets.

          ...just changing the TZ used by that instance of DateMathParser for rounding/math isn't going to do any good if the user then tries to filter on one of those constraints and the filter query code winds up using the defaults in DateField (ie: NOW/DAY and NOW/DAY+1HOUR are going to be very differnet things in the facet count code path vs the filter query code path))

          Now that we have SolrRequestInfo and a request param to specify the meaning of "NOW", the same logic could be used to allow a request param to specify the TZ/Locale properties of the DateMathParser as well.

          But like I said: this should really only be used to affect the math in DateMathParser – it should not be used in DateField.parseDate/formatDate because DateField by definition deals with a single canonical time format, by the time the DateField class is involved in dealing with a Date everything should be un-ambiguisly expressable in UTC.

          logic for parsing date strings that aren't in the canonical date format should be a QParser responsibility at query time, or an UpdateProcessor responsibility at index time. Logic for formatting dates in non-canonical format should be a ResponseWriter responsibility. This new request property we're talking about for defining the "users TZ" can certainly be used in all of these places to pick/override defaults, but that type of logic really doesn't belong in DateField.

          Show
          Hoss Man added a comment - If I want midnight in Central time zone I shouldn't have to write: 2011-01-01T06:00:00Z (Note I wrote 6:00 not 0:00) "Central time zone" is a vague concept that may mean one thing to you, but may mean something different to someone else. For any arbitrary moment in the (one dimensional) space of time values, there are an infinite number of ways to represent that time as a string (or as number) depending on where you place your origin for the coordinate system. Requiring that clients format times in UTC is no different then requiring clients to use Arabic numerals to represent integers – it's just a matter of making sure there is no ambiguity, and everyone is using the same definition of "0". UTC is a completely unambiguous coordinate system for times, that is guaranteed to work in any JVM that Solr might run on. Even if we added code to allow dates to be expressed in arbitrary user selected timezones, we couldn't make that garuntee. Bottom line: the issue of parsing/formatting times in other coordinate systems (ie: timezones) should not be convoluted with the issue of what timezone is used by the DateMathParser when rounding – those are distinct issues. It's completely conceivable to have a QParser that accepts a variety of data formats and "guesses" what TZ is meant and use that QParser in the same request where you want date faceting based on a TZ that is specified distinctly from the query string (ie: user's local TZ is UTC-0700, but they are searching for records dated before "Dec 15, 2010 4:20PM EST") So one possible alternative that needs more thought is a "TZ" request parameter that would apply by default to things that are date related. Right ... from the beginning DateMathparser was designed with the hope that a TZ/Locale pair could be specified per request (or per field declaration) for driving the rounding/math logic, there was just no sane way to specify an alternative to UTC/US that could be past down into the DateMathParser and used ubiquitously in a request because of the FieldType API. (Slight digression... its really only essential that we can affect DateMathParser the SimpleFacets uses when dealing with the gap of the date facets. ...just changing the TZ used by that instance of DateMathParser for rounding/math isn't going to do any good if the user then tries to filter on one of those constraints and the filter query code winds up using the defaults in DateField (ie: NOW/DAY and NOW/DAY+1HOUR are going to be very differnet things in the facet count code path vs the filter query code path)) Now that we have SolrRequestInfo and a request param to specify the meaning of "NOW", the same logic could be used to allow a request param to specify the TZ/Locale properties of the DateMathParser as well. But like I said: this should really only be used to affect the math in DateMathParser – it should not be used in DateField.parseDate/formatDate because DateField by definition deals with a single canonical time format, by the time the DateField class is involved in dealing with a Date everything should be un-ambiguisly expressable in UTC. logic for parsing date strings that aren't in the canonical date format should be a QParser responsibility at query time, or an UpdateProcessor responsibility at index time. Logic for formatting dates in non-canonical format should be a ResponseWriter responsibility. This new request property we're talking about for defining the "users TZ" can certainly be used in all of these places to pick/override defaults, but that type of logic really doesn't belong in DateField.
          Hide
          David Schlotfeldt added a comment -

          Being able to specify dates in timezones other then GMT+0 isn't a problem. It would just be nice but we can gnore that.

          The time zone the DateMathParser is configured with is the issue (which it sounds like you understand.) My solution that changes the timezone DateMathParser is constructed with in SimpleFacet to parse start, end and gap isn't ideal. I went this route because I don't want to run a custom built Solr – my solution allowed me to fix the "bug" by simply replacing the "facet" SearchComponent. Affecting all DateMathParsrs created for length of the request is what is really needed (which is what you said). I like your approach.

          It sounds like we are on the same page.

          So, can we get this added?

          Without time zone affecting DateMathParser the date faceting is useless (at least for 100% the situations I would use it for)

          By the way, I'm gald to see how many responses there have been. I'm happy to see how active this project is.

          Show
          David Schlotfeldt added a comment - Being able to specify dates in timezones other then GMT+0 isn't a problem. It would just be nice but we can gnore that. The time zone the DateMathParser is configured with is the issue (which it sounds like you understand.) My solution that changes the timezone DateMathParser is constructed with in SimpleFacet to parse start, end and gap isn't ideal. I went this route because I don't want to run a custom built Solr – my solution allowed me to fix the "bug" by simply replacing the "facet" SearchComponent. Affecting all DateMathParsrs created for length of the request is what is really needed (which is what you said). I like your approach. It sounds like we are on the same page. So, can we get this added? Without time zone affecting DateMathParser the date faceting is useless (at least for 100% the situations I would use it for) By the way, I'm gald to see how many responses there have been. I'm happy to see how active this project is.
          Hide
          David Smiley added a comment -

          Hoss, thanks for elaborating on the distinction between the date literal and the DateMath timezone. I was conflating these issues in my mind – silly me.

          Show
          David Smiley added a comment - Hoss, thanks for elaborating on the distinction between the date literal and the DateMath timezone. I was conflating these issues in my mind – silly me.
          Hide
          Shotaro Kamio added a comment -

          David, we also faced the date facet gap (rounding) issue. If you can post your patch here, it's very helpful.

          Show
          Shotaro Kamio added a comment - David, we also faced the date facet gap (rounding) issue. If you can post your patch here, it's very helpful.
          Hide
          Shotaro Kamio added a comment -

          Thanks for David Schlotfeldt's kindness, I'll post his code here on behalf of him. It works with Solr 3.3.
          Though we need to change the code in order to fit well into solr code base, I post as it is for the first step.

          The attached tgz contains two java sources which create a custom facet component.
          The codes are based on SimpleFacets and FacetComponent classes of solr. The following line should be added to solrconfig.xml to use the custom component.

          <searchComponent name="facet" class="com.plaudit.core.impl.solr.handler.component.PlauditFacetComponent"/>

          As an example request, date facet request with monthly gap in Tokyo time (GMT+9:00) can be like this:

          http://localhost:8983/solr/select?indent=on&version=2.2&q=*%3A*&fq=&start=0&rows=10&fl=*%2Cscore&qt=&wt=&explanOther=&hl.fl=&
          facet=true&facet.date=manufacturedate_dt&
          facet.date.start=2005-01-31T15:00:00Z&
          facet.date.end=2006-05-31T15:00:00Z&
          facet.date.gap=%2B1MONTH/DAY&
          facet.date.tz=Asia/Tokyo

          Show
          Shotaro Kamio added a comment - Thanks for David Schlotfeldt's kindness, I'll post his code here on behalf of him. It works with Solr 3.3. Though we need to change the code in order to fit well into solr code base, I post as it is for the first step. The attached tgz contains two java sources which create a custom facet component. The codes are based on SimpleFacets and FacetComponent classes of solr. The following line should be added to solrconfig.xml to use the custom component. <searchComponent name="facet" class="com.plaudit.core.impl.solr.handler.component.PlauditFacetComponent"/> As an example request, date facet request with monthly gap in Tokyo time (GMT+9:00) can be like this: http://localhost:8983/solr/select?indent=on&version=2.2&q=*%3A*&fq=&start=0&rows=10&fl=*%2Cscore&qt=&wt=&explanOther=&hl.fl=& facet=true&facet.date=manufacturedate_dt& facet.date.start=2005-01-31T15:00:00Z& facet.date.end=2006-05-31T15:00:00Z& facet.date.gap=%2B1MONTH/DAY& facet.date.tz=Asia/Tokyo
          Hide
          Shotaro Kamio added a comment - - edited

          Custom facet component code is attached.

          Show
          Shotaro Kamio added a comment - - edited Custom facet component code is attached.
          Hide
          David Schlotfeldt added a comment -

          Adds 'tz' parameter

          Show
          David Schlotfeldt added a comment - Adds 'tz' parameter
          Hide
          David Schlotfeldt added a comment -

          Thank you Shotaro for posting that code for me. The code Shotaro posted is a new version of faceting that makes date ranges take time zones into account which is essential for the reasons specified above.

          I also just made DateField and TrieDateField also take time zone into account. Why do we need this? For example let's say you have "events" in solr. They have a startDate set on them. If we want all events happening today we would want to be able to run the query "+startDate[NOW/DAY TO NOW/DAY+1]" This query will not work as expected since when "/DAY" rounds to the beginning of the day for GMT+0. To solve this yes I could run "+startDate[NOW/DAY+6HOUR TO NOW/DAY+1+6HOUR]" since i am in central timezone. BUT we have daylight savings time so some parts of the year I need it to be +5HOUR instead of +6HOUR. The lack of timezone support makes many date related queries that should be easy hard.

          The comment above is a patch against https://svn.apache.org/repos/asf/lucene/dev/tags/lucene_solr_3_3 (from 8/9/2011 which is what i am running off of) that adds a "tz" parameter that causes the DateMathParser DateField and TrieDateField uses to be configured with the specified timezone.

          (Though not posted here I also modified the component Shotaro shared with you to use "tz" if "facet.date.tz" isn't specified.)

          Show
          David Schlotfeldt added a comment - Thank you Shotaro for posting that code for me. The code Shotaro posted is a new version of faceting that makes date ranges take time zones into account which is essential for the reasons specified above. I also just made DateField and TrieDateField also take time zone into account. Why do we need this? For example let's say you have "events" in solr. They have a startDate set on them. If we want all events happening today we would want to be able to run the query "+startDate [NOW/DAY TO NOW/DAY+1] " This query will not work as expected since when "/DAY" rounds to the beginning of the day for GMT+0. To solve this yes I could run "+startDate [NOW/DAY+6HOUR TO NOW/DAY+1+6HOUR] " since i am in central timezone. BUT we have daylight savings time so some parts of the year I need it to be +5HOUR instead of +6HOUR. The lack of timezone support makes many date related queries that should be easy hard. The comment above is a patch against https://svn.apache.org/repos/asf/lucene/dev/tags/lucene_solr_3_3 (from 8/9/2011 which is what i am running off of) that adds a "tz" parameter that causes the DateMathParser DateField and TrieDateField uses to be configured with the specified timezone. (Though not posted here I also modified the component Shotaro shared with you to use "tz" if "facet.date.tz" isn't specified.)
          Hide
          David Schlotfeldt added a comment -

          Issue with patch. Fixed attached. (Change on line 197 of the patch file)

          Show
          David Schlotfeldt added a comment - Issue with patch. Fixed attached. (Change on line 197 of the patch file)
          Hide
          Nguyen Kien Trung added a comment -

          hi David, I'm one of 100s people having this issue. I applied your patch on 3.3, and modified the SimpleFacetsTest to cover a simple timezone test scenario. However, the tests for DateField and TrieDateField fail. Is there an additional changes need to be made on SimpleFacets?

          SimpleFacetsTest#indexDateFacets
              add_doc(i, "2012", f, "2007-07-30T07:07:07.070Z");
              add_doc(i, "2015", f, "2007-07-30T23:07:07.070Z"); // one more record
          
          SimpleFacetsTest#helpTestDateFacets
              ...
              final String jul4 = rangeMode ? "[.='1'  ]" : "[.='2'  ]";
              
              assertQ("check counts for day of facet by day using UTC timezone",
                      req( "q", "*:*"
                          ,"rows", "0"
                          ,"facet", "true"
                          ,p, f
                          ,p+".start", "2007-07-30T00:00:00.000Z"
                          ,p+".end",   "2007-07-31T00:00:00.000Z"
                          ,p+".gap",   "+1DAY"
                          ,"tz", "UTC"
                          )
                      ,"*[count("+pre+"/int)="+(rangeMode ? 1 : 1)+"]"
                      ,pre+"/int[@name='2007-07-30T00:00:00Z'][.='2']"
                      );
              
              assertQ("check counts for day of facet by day using Asia/Singapore (UTC+8) timezone",
                      req( "q", "*:*"
                          ,"rows", "0"
                          ,"facet", "true"
                          ,p, f
                          ,p+".start", "2007-07-30T00:00:00.000Z"
                          ,p+".end",   "2007-07-31T00:00:00.000Z"
                          ,p+".gap",   "+1DAY"
                          ,"tz", "Asia/Singapore"
                          )
                      ,"*[count("+pre+"/int)="+(rangeMode ? 1 : 1)+"]"
                      ,pre+"/int[@name='2007-07-30T00:00:00Z'][.='1']"
                      );    // fail here, still returns 2 instead of 1, already set tests.timezone parameter to UTC to make sure data indexed in UTC
              ...
          
          Show
          Nguyen Kien Trung added a comment - hi David, I'm one of 100s people having this issue. I applied your patch on 3.3, and modified the SimpleFacetsTest to cover a simple timezone test scenario. However, the tests for DateField and TrieDateField fail. Is there an additional changes need to be made on SimpleFacets? SimpleFacetsTest#indexDateFacets add_doc(i, "2012" , f, "2007-07-30T07:07:07.070Z" ); add_doc(i, "2015" , f, "2007-07-30T23:07:07.070Z" ); // one more record SimpleFacetsTest#helpTestDateFacets ... final String jul4 = rangeMode ? "[.='1' ]" : "[.='2' ]" ; assertQ( "check counts for day of facet by day using UTC timezone" , req( "q" , "*:*" , "rows" , "0" , "facet" , " true " ,p, f ,p+ ".start" , "2007-07-30T00:00:00.000Z" ,p+ ".end" , "2007-07-31T00:00:00.000Z" ,p+ ".gap" , "+1DAY" , "tz" , "UTC" ) , "*[count(" +pre+ "/ int )=" +(rangeMode ? 1 : 1)+ "]" ,pre+ "/ int [@name='2007-07-30T00:00:00Z'][.='2']" ); assertQ( "check counts for day of facet by day using Asia/Singapore (UTC+8) timezone" , req( "q" , "*:*" , "rows" , "0" , "facet" , " true " ,p, f ,p+ ".start" , "2007-07-30T00:00:00.000Z" ,p+ ".end" , "2007-07-31T00:00:00.000Z" ,p+ ".gap" , "+1DAY" , "tz" , "Asia/Singapore" ) , "*[count(" +pre+ "/ int )=" +(rangeMode ? 1 : 1)+ "]" ,pre+ "/ int [@name='2007-07-30T00:00:00Z'][.='1']" ); // fail here, still returns 2 instead of 1, already set tests.timezone parameter to UTC to make sure data indexed in UTC ...
          Hide
          David Schlotfeldt added a comment -

          Nguyen - Sorry for the slow response. I've been on vacation where I had no Internet. My brain is still on vacation so I might be wrong, but I believe the problem is your aren't rounding the start and end.

          I believe this will leave the time part of the ranges as "00:00:00.00Z" (GMT). The timezone specified by 'tz' simply affects math done with dates (this includes when the gap is incremented).
          start = 2007-07-30T00:00:00.000Z
          end = 2007-07-31T00:00:00.000Z
          gap = +1DAY
          tz = Asia/Singapore

          One way to get what you want, which is the way people have been saying to do it, is to adjust the start and end manually. (I believe Singapore is GMT+8)
          start = 2007-07-30T00:08:00.000Z
          end = 2007-07-31T00:08:00.000Z
          gap = +1DAY
          tz = Asia/Singapore

          The issue with this approach is your "manual" adjusting gets tricking when talking about timezones that have day light savings time. This is one place the patch makes things SOOOO much easier.

          So you want to change your test to simply add '/DAY' to the end of each which will round the times to the beginning of the day. Since they rounding takes tz into account it will come out to 8:00.
          start = 2007-07-30T00:00:00.000Z/DAY
          end = 2007-07-31T00:00:00.000Z/DAY
          gap = +1DAY
          tz = Asia/Singapore

          Let me know if that helps. (Again, brain not in programming mode yet so I may have completely misunderstood the issue.)

          Show
          David Schlotfeldt added a comment - Nguyen - Sorry for the slow response. I've been on vacation where I had no Internet. My brain is still on vacation so I might be wrong, but I believe the problem is your aren't rounding the start and end. I believe this will leave the time part of the ranges as "00:00:00.00Z" (GMT). The timezone specified by 'tz' simply affects math done with dates (this includes when the gap is incremented). start = 2007-07-30T00:00:00.000Z end = 2007-07-31T00:00:00.000Z gap = +1DAY tz = Asia/Singapore One way to get what you want, which is the way people have been saying to do it, is to adjust the start and end manually. (I believe Singapore is GMT+8) start = 2007-07-30T00:08:00.000Z end = 2007-07-31T00:08:00.000Z gap = +1DAY tz = Asia/Singapore The issue with this approach is your "manual" adjusting gets tricking when talking about timezones that have day light savings time. This is one place the patch makes things SOOOO much easier. So you want to change your test to simply add '/DAY' to the end of each which will round the times to the beginning of the day. Since they rounding takes tz into account it will come out to 8:00. start = 2007-07-30T00:00:00.000Z/DAY end = 2007-07-31T00:00:00.000Z/DAY gap = +1DAY tz = Asia/Singapore Let me know if that helps. (Again, brain not in programming mode yet so I may have completely misunderstood the issue.)
          Hide
          Hoss Man added a comment -

          David: I like the theory of your patch, but the implementation choices you made seem a little heavy handed. In particular I don't understand the need for your ExecuteWithThreadDateMathParser – as i mentioned, the DateMathParser class already uses SolrRequestInfo to define the default "NOW", we can use similar logic to define a default TZ w/o introducing as much complexity.

          I whipped up a quick patch that takes this approach – please take a look and lemme know what you think. I did some quick manual testing and everything seems to be working, but obviously we need some good unit tests before we can commit (not sure when i'll have a chance to work on this so feel free to jump in if you want)

          The timezone specified by 'tz' simply affects math done with dates

          Right – this is really the only sane way for this to work – if the client specifies an absolute time for start/end when faceting, we have to use that absolute time, we can't assume they mean for that to be "rounded" in some way relative the client's timezone – rounded to what precision? Day? Month? Year?).

          Another digression...

          One way to get what you want, which is the way people have been saying to do it, is to adjust the start and end manually. ... The issue with this approach is your "manual" adjusting gets tricking when talking about timezones that have day light savings time.

          You should never "manually adjust" times before sending them to solr – If you have an absolute abstract moment in time, then you should format that abstract moment of time as string using then canonical solr date format (which requires the use of UTC in the format). If you are starting with a string representation of some absolute abstract moment of time that has been formated using some other arbitrary format and/or timezone, you should parse it into an abstract moment in time (using a date parsing library that knows about the rules of your timezone – any decent one should know about all the timezone data), and then format that abstract moment of time as string using the canonical solr date format.

          Show
          Hoss Man added a comment - David: I like the theory of your patch, but the implementation choices you made seem a little heavy handed. In particular I don't understand the need for your ExecuteWithThreadDateMathParser – as i mentioned, the DateMathParser class already uses SolrRequestInfo to define the default "NOW", we can use similar logic to define a default TZ w/o introducing as much complexity. I whipped up a quick patch that takes this approach – please take a look and lemme know what you think. I did some quick manual testing and everything seems to be working, but obviously we need some good unit tests before we can commit (not sure when i'll have a chance to work on this so feel free to jump in if you want) The timezone specified by 'tz' simply affects math done with dates Right – this is really the only sane way for this to work – if the client specifies an absolute time for start/end when faceting, we have to use that absolute time, we can't assume they mean for that to be "rounded" in some way relative the client's timezone – rounded to what precision? Day? Month? Year?). Another digression... One way to get what you want, which is the way people have been saying to do it, is to adjust the start and end manually. ... The issue with this approach is your "manual" adjusting gets tricking when talking about timezones that have day light savings time. You should never "manually adjust" times before sending them to solr – If you have an absolute abstract moment in time, then you should format that abstract moment of time as string using then canonical solr date format (which requires the use of UTC in the format). If you are starting with a string representation of some absolute abstract moment of time that has been formated using some other arbitrary format and/or timezone, you should parse it into an abstract moment in time (using a date parsing library that knows about the rules of your timezone – any decent one should know about all the timezone data), and then format that abstract moment of time as string using the canonical solr date format.
          Hide
          Hoss Man added a comment -

          editing summary & description to clarify this isn't just about faceting, but date math in general.

          Show
          Hoss Man added a comment - editing summary & description to clarify this isn't just about faceting, but date math in general.
          Hide
          Hoss Man added a comment -

          updated patch with tests.

          still a few TODOs and nocommits related to validating the TZ param: TimeZone.getTimeZone happily accepts gibberish and returns GMT instead – which would be really confusing if Solr is running on a server with an older tzdata file and the client tried to specify some relatively new timezone, silently getting GMT instead.

          There is a TimeZone.getAvailableIDs method that we could use to do a quick check, but that would only cover named TimeZones (ie: "America/Los_Angeles") so we have to explicitly validate if it's a legal "CustomID" (ie: "GMT+/-\d+) ... need another 30 minutes or so at some point to wrap that logic up into a static utility that can be used by both SolrRequestInfo and the test classes

          Show
          Hoss Man added a comment - updated patch with tests. still a few TODOs and nocommits related to validating the TZ param: TimeZone.getTimeZone happily accepts gibberish and returns GMT instead – which would be really confusing if Solr is running on a server with an older tzdata file and the client tried to specify some relatively new timezone, silently getting GMT instead. There is a TimeZone.getAvailableIDs method that we could use to do a quick check, but that would only cover named TimeZones (ie: "America/Los_Angeles") so we have to explicitly validate if it's a legal "CustomID" (ie: "GMT+/-\d+) ... need another 30 minutes or so at some point to wrap that logic up into a static utility that can be used by both SolrRequestInfo and the test classes
          Hide
          Hoss Man added a comment -

          Updated patch, adds TimeZoneUtils (with tests) to do what TimeZone.getTimeZone should have done in the first place.

          I think this is ready to go.

          any feedback from anyone on the overall approach?

          Show
          Hoss Man added a comment - Updated patch, adds TimeZoneUtils (with tests) to do what TimeZone.getTimeZone should have done in the first place. I think this is ready to go. any feedback from anyone on the overall approach?
          Hide
          Hoss Man added a comment -

          Committed revision 1329837.

          Show
          Hoss Man added a comment - Committed revision 1329837.
          Hide
          David Schlotfeldt added a comment -

          Sorry I'm not familiar with Solr's SVN layout. What SVN location do I look at? The trunk? A branch?

          What files were changed? Are all the changes you made in revision 1329837

          Show
          David Schlotfeldt added a comment - Sorry I'm not familiar with Solr's SVN layout. What SVN location do I look at? The trunk? A branch? What files were changed? Are all the changes you made in revision 1329837
          Hide
          Hoss Man added a comment -

          David: r1329837 was the primary commit yes, but there were some subsequent commits to fix the tests.

          you can see all commits related to this issue (and all files modified by those ccommits) by clicking the "All" sub-tab under "Activity" (above the list of comments)

          https://wiki.apache.org/solr/HowToContribute#JIRA_tips_.28our_issue.2BAC8-bug_tracker.29

          Show
          Hoss Man added a comment - David: r1329837 was the primary commit yes, but there were some subsequent commits to fix the tests. you can see all commits related to this issue (and all files modified by those ccommits) by clicking the "All" sub-tab under "Activity" (above the list of comments) https://wiki.apache.org/solr/HowToContribute#JIRA_tips_.28our_issue.2BAC8-bug_tracker.29

            People

            • Assignee:
              Hoss Man
              Reporter:
              David Schlotfeldt
            • Votes:
              3 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Time Tracking

                Estimated:
                Original Estimate - 3h
                3h
                Remaining:
                Remaining Estimate - 3h
                3h
                Logged:
                Time Spent - Not Specified
                Not Specified

                  Development