Solr
  1. Solr
  2. SOLR-750

DateField.parseMath doesn't handle non-existent Z

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Minor Minor
    • Resolution: Invalid
    • Affects Version/s: 1.3
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      I've run into situations when trying to use SOLR-540 (wildcard highlight spec) such that if attempts to highlight a date field, I get a stack trace from DateField.parseMath puking because there isn't a "Z" at the end of an otherwise good date-time string. It was very easy to fix the code to make it react gracefully to no Z. Attached is the patch. This bug isn't really related to SOLR-540 so please apply it without waiting for 540.

        Issue Links

          Activity

          Hide
          Hoss Man added a comment -

          Solr does have timezone support: it supports UTC ... that may sound like a cop-out answer, but it's true. DateField specifies that it only accepts UTC formated dates, and stores the UTC date in the index, it knows that a date it receives as input is UTC because it ends in "Z"

          In the future, DateField might start allowing documents to be indexed with alternate timezone specifiers, and convert to UTC internally before writing to the index; or new options might get added at some point to allow query clients to specify what timezone they are in, and solr could convert all the internal dates to that timezone for them, etc...

          ...if/when features like those get implemented, they can only work if there is a standardized internal format, and at hte moment the only way DateField can ensure that there is a standardized internal format is if it forces the clients updating the index to only send UTC dates.

          If I'm using Solr and want to feed it dates in a particular time zone, or perhaps a local-time of day, and clients expect this, then why should Solr force me to specify a timezone? I find it irritating.

          there's nothing to stop you from lying to solr about the timezone. If all of the update/search clinets for your instance are in on the secret that the times are really GMT-0730 even though solr thinks they are UTC, then no one gets hurt.

          But if we droped the requirement that date inputs have the "Z" suffix, people would assume they can index stuff like 1995-12-31T23:59:59-07:30 and then be confused when it doesn't work.

          Show
          Hoss Man added a comment - Solr does have timezone support: it supports UTC ... that may sound like a cop-out answer, but it's true. DateField specifies that it only accepts UTC formated dates, and stores the UTC date in the index, it knows that a date it receives as input is UTC because it ends in "Z" In the future, DateField might start allowing documents to be indexed with alternate timezone specifiers, and convert to UTC internally before writing to the index; or new options might get added at some point to allow query clients to specify what timezone they are in, and solr could convert all the internal dates to that timezone for them, etc... ...if/when features like those get implemented, they can only work if there is a standardized internal format, and at hte moment the only way DateField can ensure that there is a standardized internal format is if it forces the clients updating the index to only send UTC dates. If I'm using Solr and want to feed it dates in a particular time zone, or perhaps a local-time of day, and clients expect this, then why should Solr force me to specify a timezone? I find it irritating. there's nothing to stop you from lying to solr about the timezone. If all of the update/search clinets for your instance are in on the secret that the times are really GMT-0730 even though solr thinks they are UTC, then no one gets hurt. But if we droped the requirement that date inputs have the "Z" suffix, people would assume they can index stuff like 1995-12-31T23:59:59-07:30 and then be confused when it doesn't work.
          Hide
          David Smiley added a comment -

          Ignoring the long-gone circumstances in which I encountered and reported this issue originally...
          I do feel strongly that Solr shouldn't force me to specify a Z when Solr doesn't really have any time zone support. And as such it shouldn't emit the "Z" in date output either. If I'm using Solr and want to feed it dates in a particular time zone, or perhaps a local-time of day, and clients expect this, then why should Solr force me to specify a timezone? I find it irritating.

          Show
          David Smiley added a comment - Ignoring the long-gone circumstances in which I encountered and reported this issue originally... I do feel strongly that Solr shouldn't force me to specify a Z when Solr doesn't really have any time zone support. And as such it shouldn't emit the "Z" in date output either. If I'm using Solr and want to feed it dates in a particular time zone, or perhaps a local-time of day, and clients expect this, then why should Solr force me to specify a timezone? I find it irritating.
          Hide
          Hoss Man added a comment -

          The exception is correct – that is an invalid date string (as far as being input to parseMath, toInternal, or DateField.getAnalyzer().tokenStream is concerned)

          The SOLR-540 patch is doing something it shouldn't be (which seems likely since it makes absolutely no sense to try and highlight a DateField) and/or the Highlighter has a bug (why is getBestTextFragments passing an indexed token to an Analzyer?)

          Either way: parseMath is doing the right thing.

          Show
          Hoss Man added a comment - The exception is correct – that is an invalid date string (as far as being input to parseMath, toInternal, or DateField.getAnalyzer().tokenStream is concerned) The SOLR-540 patch is doing something it shouldn't be (which seems likely since it makes absolutely no sense to try and highlight a DateField) and/or the Highlighter has a bug (why is getBestTextFragments passing an indexed token to an Analzyer?) Either way: parseMath is doing the right thing.
          Hide
          David Smiley added a comment -

          Here's the relevant snippet of the stack trace that occurs.

          20:27:30,169 ERROR [SolrCore] org.apache.solr.common.SolrException: Invalid Date String:'2008-08-27T06:44:13.000'
          	at org.apache.solr.schema.DateField.parseMath(DateField.java:167)
          	at org.apache.solr.schema.DateField.toInternal(DateField.java:138)
          	at org.apache.solr.schema.FieldType$DefaultAnalyzer$1.next(FieldType.java:315)
          	at org.apache.solr.highlight.TokenOrderingFilter.next(DefaultSolrHighlighter.java:389)
          	at org.apache.lucene.analysis.TokenStream.next(TokenStream.java:91)
          	at org.apache.lucene.search.highlight.Highlighter.getBestTextFragments(Highlighter.java:230)
          	at org.apache.solr.highlight.DefaultSolrHighlighter.doHighlighting(DefaultSolrHighlighter.java:310)
          	at org.apache.solr.handler.component.HighlightComponent.process(HighlightComponent.java:83)
          	at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:169)
          

          (I'm a fan of SOLR-540 because I've got LOTS of fields and don't want to enumerate each of them.)

          Show
          David Smiley added a comment - Here's the relevant snippet of the stack trace that occurs. 20:27:30,169 ERROR [SolrCore] org.apache.solr.common.SolrException: Invalid Date String :'2008-08-27T06:44:13.000' at org.apache.solr.schema.DateField.parseMath(DateField.java:167) at org.apache.solr.schema.DateField.toInternal(DateField.java:138) at org.apache.solr.schema.FieldType$DefaultAnalyzer$1.next(FieldType.java:315) at org.apache.solr.highlight.TokenOrderingFilter.next(DefaultSolrHighlighter.java:389) at org.apache.lucene.analysis.TokenStream.next(TokenStream.java:91) at org.apache.lucene.search.highlight.Highlighter.getBestTextFragments(Highlighter.java:230) at org.apache.solr.highlight.DefaultSolrHighlighter.doHighlighting(DefaultSolrHighlighter.java:310) at org.apache.solr.handler.component.HighlightComponent.process(HighlightComponent.java:83) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:169) (I'm a fan of SOLR-540 because I've got LOTS of fields and don't want to enumerate each of them.)
          Hide
          Hoss Man added a comment -

          I'm not really sure why the code would try to highlight a date field (sounds like a bug in the SOLR-540 patch, and yet another great example of why i'm opposed to things like SOLR-540) but this patch doesn't really make sense to me either ... the "Z" is not optional. it is a mandatory part of the input format.

          When dates are indexed the internal representation doesn't include the 'Z' but the internal representation is not valid input for the parseMath method.

          Show
          Hoss Man added a comment - I'm not really sure why the code would try to highlight a date field (sounds like a bug in the SOLR-540 patch, and yet another great example of why i'm opposed to things like SOLR-540 ) but this patch doesn't really make sense to me either ... the "Z" is not optional. it is a mandatory part of the input format. When dates are indexed the internal representation doesn't include the 'Z' but the internal representation is not valid input for the parseMath method.

            People

            • Assignee:
              Unassigned
              Reporter:
              David Smiley
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Time Tracking

                Estimated:
                Original Estimate - 0.25h
                0.25h
                Remaining:
                Remaining Estimate - 0.25h
                0.25h
                Logged:
                Time Spent - Not Specified
                Not Specified

                  Development