Solr
  1. Solr
  2. SOLR-470

DateField throws error on iso8601 date

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 1.3
    • Fix Version/s: 1.3
    • Component/s: search
    • Labels:
      None

      Description

      A correct iso 8601 date 2006-01-01T12:01:00Z throws an error.
      Unparseable date: "2006-01-01T12:01:00Z" at org.apache.solr.schema.DateField.toObject(DateField.java:173) at org.apache.solr.schema.DateField.toObject(DateField.java:83)

      The ThreadLocalDateFormat requires fractional seconds "yyyy-MM-dd'T'HH:mm:ss.SSS"
      to parse with simple date format. Where as the jdoc states their optional.

      1. SOLR-470.patch
        28 kB
        Hoss Man
      2. SOLR-470.patch
        31 kB
        Hoss Man
      3. SOLR-470.patch
        21 kB
        Hoss Man
      4. SOLR-470.patch
        8 kB
        Hoss Man
      5. SOLR-470.patch
        2 kB
        Noble Paul

        Issue Links

          Activity

          Hide
          Hoss Man added a comment -

          a quick glance at the code verifies this (to me anyway ... i haven't written a test case for it)

          I believe the problem only exists when dealing with DateMath relative to an absolute date ... when indexing or querying on a pure date value (ie: ending with a Z) the SimpleDateFormatter isn't used, so the milliseconds are still optional.

          The ThreadLocalDateFormat helper was introduced in Solr 1.2, but this bug shouldn't affect 1.2 because it didn't support date math relative anything except "NOW"

          The root of the problem is that when i added the ThreadLocalDateFormat before 1.2, i tried to make it clear in the javadocs it was only for formatting, not for parsing...

            /**
             * Returns a formatter that can be use by the current thread if needed to
             * convert Date objects to the Internal representation.
             */
            protected DateFormat getThreadLocalDateFormat() {
          

          ....but when i generalized the DateMath support i forgot to read my own javadoc and starte using it for parsing as well.

          There's no simple way (that i know of) to make a SimpleDateFormatter support optional milliseconds. So the most straight forward fix is probably to use a (Parsing) Formatter that stops reading before the milliseconds, and then add them manually – this means two seperate SimpleDateFormatters, (one for parsing and one for formatting) but that's "OK" since we never said the existing one was safe for parsing anyway.

          Show
          Hoss Man added a comment - a quick glance at the code verifies this (to me anyway ... i haven't written a test case for it) I believe the problem only exists when dealing with DateMath relative to an absolute date ... when indexing or querying on a pure date value (ie: ending with a Z) the SimpleDateFormatter isn't used, so the milliseconds are still optional. The ThreadLocalDateFormat helper was introduced in Solr 1.2, but this bug shouldn't affect 1.2 because it didn't support date math relative anything except "NOW" The root of the problem is that when i added the ThreadLocalDateFormat before 1.2, i tried to make it clear in the javadocs it was only for formatting, not for parsing... /** * Returns a formatter that can be use by the current thread if needed to * convert Date objects to the Internal representation. */ protected DateFormat getThreadLocalDateFormat() { ....but when i generalized the DateMath support i forgot to read my own javadoc and starte using it for parsing as well. There's no simple way (that i know of) to make a SimpleDateFormatter support optional milliseconds. So the most straight forward fix is probably to use a (Parsing) Formatter that stops reading before the milliseconds, and then add them manually – this means two seperate SimpleDateFormatters, (one for parsing and one for formatting) but that's "OK" since we never said the existing one was safe for parsing anyway.
          Hide
          patrick o'leary added a comment -

          It's a little iffy but what if you just default the dateFields to have 0 milliseconds
          unless specified.
          e.g.

          org.apache.solr.util.DateMathParser

           public Date parseMath(String math) throws ParseException {
          
              Calendar cal = Calendar.getInstance(zone, loc);
              cal.setTime(getNow());
              cal.add(Calendar.Calendar.MILLISECOND, 0);
          ...................
          

          And if a millisecond field is added, it over writes the default ?

          Show
          patrick o'leary added a comment - It's a little iffy but what if you just default the dateFields to have 0 milliseconds unless specified. e.g. org.apache.solr.util.DateMathParser public Date parseMath( String math) throws ParseException { Calendar cal = Calendar.getInstance(zone, loc); cal.setTime(getNow()); cal.add(Calendar.Calendar.MILLISECOND, 0); ................... And if a millisecond field is added, it over writes the default ?
          Hide
          Hoss Man added a comment -

          I'm notsure i understand your comment ... i don't see any reason why this bug would have anything to do with DateMathParser – which only deals with the "math" portion of an input string (ie: "/HOUR", "+2YEARS", "+6MONTHS+3DAYS/DAY" etc...) and evaluates them relative some concept of "now" – the problem seems to fall squarely on DateField.toObject which assumes it can call call getThreadLocalDateFormat().parse(...) even though that DateFormat is only intended for formatting.

          Even if we changed DateMathParser as you described i don't see how that would stop the parser from throwing the "Invalid Date in Date Math String:..." exception.

          Show
          Hoss Man added a comment - I'm notsure i understand your comment ... i don't see any reason why this bug would have anything to do with DateMathParser – which only deals with the "math" portion of an input string (ie: "/HOUR", "+2YEARS", "+6MONTHS+3DAYS/DAY" etc...) and evaluates them relative some concept of "now" – the problem seems to fall squarely on DateField.toObject which assumes it can call call getThreadLocalDateFormat().parse(...) even though that DateFormat is only intended for formatting. Even if we changed DateMathParser as you described i don't see how that would stop the parser from throwing the "Invalid Date in Date Math String:..." exception.
          Hide
          patrick o'leary added a comment -

          parseMath returns a Date object that's formated to the SDF in the LocalThread that's fine /scratch last un-caffeinated comment.

          I think the root of the problem is any call to 'toInternal(String)' with a string that end's in 'Z' is indexed directly, which now means there can be multiple date formats represented internally in the index.

          But in saying that, I'm guessing that a Field does not necessarily have to be indexed to call toObject.

          What would be nice is to switch from using SimpleDateFormat to org.apache.commons.lang.time.DateUtils
          Which supports parseDate(String str, String[] parsePatterns) , where multiple patterns can be tried, and using
          init to allow users supply their own patterns.

          And then using org.apache.commons.lang.time.FastDateFormat to represent the dates internally.

          Both of these are thread safe, which should make a Flexible DateField a lot easier to implement without
          much over head.

          Show
          patrick o'leary added a comment - parseMath returns a Date object that's formated to the SDF in the LocalThread that's fine /scratch last un-caffeinated comment. I think the root of the problem is any call to 'toInternal(String)' with a string that end's in 'Z' is indexed directly, which now means there can be multiple date formats represented internally in the index. But in saying that, I'm guessing that a Field does not necessarily have to be indexed to call toObject. What would be nice is to switch from using SimpleDateFormat to org.apache.commons.lang.time.DateUtils Which supports parseDate(String str, String[] parsePatterns) , where multiple patterns can be tried, and using init to allow users supply their own patterns. And then using org.apache.commons.lang.time.FastDateFormat to represent the dates internally. Both of these are thread safe, which should make a Flexible DateField a lot easier to implement without much over head.
          Hide
          Hoss Man added a comment -

          note: in addition to DateField.toObject(String) (which is used when there is DateMath) DateField.toObject(Fieldable) also has the same bug...

          http://www.nabble.com/Unparseable-date-td15854401.html

          Show
          Hoss Man added a comment - note: in addition to DateField.toObject(String) (which is used when there is DateMath) DateField.toObject(Fieldable) also has the same bug... http://www.nabble.com/Unparseable-date-td15854401.html
          Hide
          Noble Paul added a comment -

          A simple patch which can parse a date that ends with 'Z'

          Show
          Noble Paul added a comment - A simple patch which can parse a date that ends with 'Z'
          Hide
          Hoss Man added a comment -

          Paul: there are a few things that concern me about your patch...

          • it isn't back compatible (changes the sig of getThreadLocalDateFormat)
          • it doesn't address the root problem. The issue isn't Z or no Z, it's millis or no millis (SimpleDateFormat.parse will happily ignore extra stuff at end of string if it's not in the format, so specifying the Z isn't needed) ... what is needed is a parser that doesn't require the millis – which you have, but drawing the distinction between Z and not Z isn't correct.
          • it doesn't really work: toObject is always called on the indexed form which never has a Z at the end, so your new DateFormat isn't used ... and you'll still get an error if a date without millis is in the index.
          Show
          Hoss Man added a comment - Paul: there are a few things that concern me about your patch... it isn't back compatible (changes the sig of getThreadLocalDateFormat) it doesn't address the root problem. The issue isn't Z or no Z, it's millis or no millis (SimpleDateFormat.parse will happily ignore extra stuff at end of string if it's not in the format, so specifying the Z isn't needed) ... what is needed is a parser that doesn't require the millis – which you have, but drawing the distinction between Z and not Z isn't correct. it doesn't really work: toObject is always called on the indexed form which never has a Z at the end, so your new DateFormat isn't used ... and you'll still get an error if a date without millis is in the index.
          Hide
          Hoss Man added a comment -

          alternate patch that addresses the problem in the way i described back in my first comment ... along with some test cases.

          The parsing isn't pretty, but it's fairly straightforward (and doesn't add any dependencies on third party libraries)

          Show
          Hoss Man added a comment - alternate patch that addresses the problem in the way i described back in my first comment ... along with some test cases. The parsing isn't pretty, but it's fairly straightforward (and doesn't add any dependencies on third party libraries)
          Hide
          Noble Paul added a comment -

          hoss: The approach is fine and if we need to refine it it can be taken up later. The fix is important to be checked in as we have made the binary format the default for distributed search.

          Show
          Noble Paul added a comment - hoss: The approach is fine and if we need to refine it it can be taken up later. The fix is important to be checked in as we have made the binary format the default for distributed search.
          Hide
          Hoss Man added a comment -

          I agree that it needs to be fixed, but i want to make sure it gets fixed correctly – i would prefer we come up with a patch that also addresses SOLR-552 before committing half cocked.

          Show
          Hoss Man added a comment - I agree that it needs to be fixed, but i want to make sure it gets fixed correctly – i would prefer we come up with a patch that also addresses SOLR-552 before committing half cocked.
          Hide
          Hoss Man added a comment -

          checkpoint ... better test cases that take into account SOLR-552 as well.

          This still isn't completed because it doesn't deal with formating correctly (see failing test cases) ... attaching in case i get hit by a bus before i have time to work on it again.

          in this approach, i'm attempting to make the parsing "forgiving" of trailing zeros in the millis, but strict about ensuring that the indexed form is always the canonical per the docs.

          As a result, i introduced a LegacyDateField for people who relied on the old broken behavior that anything ending with a Z was "ok"

          Reminder to self: if applying patch to commit, do an "svn copy DateField.java LegacyDateField.java; rm LegacyDateField.java" before applying patch.

          Show
          Hoss Man added a comment - checkpoint ... better test cases that take into account SOLR-552 as well. This still isn't completed because it doesn't deal with formating correctly (see failing test cases) ... attaching in case i get hit by a bus before i have time to work on it again. in this approach, i'm attempting to make the parsing "forgiving" of trailing zeros in the millis, but strict about ensuring that the indexed form is always the canonical per the docs. As a result, i introduced a LegacyDateField for people who relied on the old broken behavior that anything ending with a Z was "ok" Reminder to self: if applying patch to commit, do an "svn copy DateField.java LegacyDateField.java; rm LegacyDateField.java" before applying patch.
          Hide
          Hoss Man added a comment -

          builds on the previous patch to include fixes for the formatting issues, code clean up, better tests (LegacyDateFieldTest is now standalone, and can be droped into Solr1.2 to regress against the old DateField), and better documentation ... addresses SOLR-552 and SOLR-544 as well.

          I think this is ready to commit, but i'd like to see some positive feedback before proceeding – both on the implementation, and the documentation changes. (something approximating the class docs for LegacyDateField should show up in CHANGES.txt's upgrading section as well).

          Show
          Hoss Man added a comment - builds on the previous patch to include fixes for the formatting issues, code clean up, better tests (LegacyDateFieldTest is now standalone, and can be droped into Solr1.2 to regress against the old DateField), and better documentation ... addresses SOLR-552 and SOLR-544 as well. I think this is ready to commit, but i'd like to see some positive feedback before proceeding – both on the implementation, and the documentation changes. (something approximating the class docs for LegacyDateField should show up in CHANGES.txt's upgrading section as well).
          Hide
          Noble Paul added a comment -

          The patch did not apply properly on trunk. SVN cound not fetc the given version of LegacyDateField

          The implementation looks fine

          Show
          Noble Paul added a comment - The patch did not apply properly on trunk. SVN cound not fetc the given version of LegacyDateField The implementation looks fine
          Hide
          Hoss Man added a comment -

          Grr... apparently "svn diff" does really foolish things in conjunction with "svn copy"

          This new patch should apply much cleaner (although there will probably still be a stray javadoc hunk because of $Id$ interpolation)

          Show
          Hoss Man added a comment - Grr... apparently "svn diff" does really foolish things in conjunction with "svn copy" This new patch should apply much cleaner (although there will probably still be a stray javadoc hunk because of $Id$ interpolation)
          Hide
          Hoss Man added a comment -

          Committed revision 658003.

          variation on last patch that included some javadoc formating, spelling, and grammar fixes

          Show
          Hoss Man added a comment - Committed revision 658003. variation on last patch that included some javadoc formating, spelling, and grammar fixes

            People

            • Assignee:
              Hoss Man
              Reporter:
              patrick o'leary
            • Votes:
              1 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development