Solr
  1. Solr
  2. SOLR-544

Dates with "optional" milliseconds are not equivilent

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 1.1.0, 1.2, 1.3
    • Fix Version/s: 1.3
    • Component/s: None
    • Labels:
      None

      Description

      Something that occured to me while working n SOLR-470 is that since the earliest versions of Solr, "DateField" has advertised it's format as...

      date field shall be of the form "1995-12-31T23:59:59Z" The trailing "Z" designates UTC time and is mandatory. Optional fractional seconds are allowed: "1995-12-31T23:59:59.999Z" All other parts are mandatory.

      The problem is that Solr has always remained happily ignorant about wether you were using milliseconds or not, even in the case of "0" milliseconds, so the following input strings do not result in Terms which are truly equal...

      • 1995-12-31T23:59:59Z
      • 1995-12-31T23:59:59.0Z
      • 1995-12-31T23:59:59.00Z
      • 1995-12-31T23:59:59.000Z

      ...which means if people are inconsistent about how they interact with DateField (sometimes including the millis and sometimes not including them) the can get incorrect behavior in various situations:

      • sorting by date with a secondary sort can cause hte secondary sort to be ignored when the dates should be considered equal.
      • range queries might miss items equal to the end points but with fewer/more characters then the input

      Any solution would require true parsing & normalizing of any date input (currently dates are only parsed if they involve DateMath) and complete reindexing

      NOTE: I don't personally think fixing this issue in DateField is worthwhile. i think it would be better to document it as a caveat and require people to be consistent in their usage of milliseconds (ie: if you are going to use them, then always use them even if they are 0).
      Instead we should probably focus on a new Long based Date Field (see SOLR-440) since that would always require parsing to get to the internal representation anyway.

        Issue Links

          Activity

          Hide
          Yonik Seeley added a comment -

          From the schema:

          The format for this date field is of the form 1995-12-31T23:59:59Z, and
          is a more restricted form of the canonical representation of dateTime
          http://www.w3.org/TR/xmlschema-2/#dateTime
          The trailing "Z" designates UTC time and is mandatory.
          Optional fractional seconds are allowed: 1995-12-31T23:59:59.999Z
          All other components are mandatory.

          Part of the canonical representation referenced states that

          The fractional second string, if present, must not end in '0';

          So not an oversight, but not enforced either.

          I agree with using a more efficient internal storage mechanism though. The current one really just stems from the observation that the values already sorted correctly if the 'Z' was lopped off.

          Show
          Yonik Seeley added a comment - From the schema: The format for this date field is of the form 1995-12-31T23:59:59Z, and is a more restricted form of the canonical representation of dateTime http://www.w3.org/TR/xmlschema-2/#dateTime The trailing "Z" designates UTC time and is mandatory. Optional fractional seconds are allowed: 1995-12-31T23:59:59.999Z All other components are mandatory. Part of the canonical representation referenced states that The fractional second string, if present, must not end in '0'; So not an oversight, but not enforced either. I agree with using a more efficient internal storage mechanism though. The current one really just stems from the observation that the values already sorted correctly if the 'Z' was lopped off.
          Hide
          Hoss Man added a comment -

          good point yonik ... perhaps we should just reiterate the "no trailing 0" aspect of the millis more strenuously in the docs. and move on to SOLR-440.

          Show
          Hoss Man added a comment - good point yonik ... perhaps we should just reiterate the "no trailing 0" aspect of the millis more strenuously in the docs. and move on to SOLR-440 .
          Hide
          Hoss Man added a comment -

          we should make sure we have something in 1.3 that addresses this (even if it is just documentation)

          Show
          Hoss Man added a comment - we should make sure we have something in 1.3 that addresses this (even if it is just documentation)
          Hide
          Hoss Man added a comment -

          patch in SOLR-470 both improves the documentation regarding trailing zeros in addition to adding a parser that ensures all dates are in the canonical format.

          Committed revision 658003.

          Show
          Hoss Man added a comment - patch in SOLR-470 both improves the documentation regarding trailing zeros in addition to adding a parser that ensures all dates are in the canonical format. Committed revision 658003.

            People

            • Assignee:
              Hoss Man
              Reporter:
              Hoss Man
            • Votes:
              1 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development