Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-8904

Switch from SimpleDateFormat to Java 8 DateTimeFormatter.ISO_INSTANT

    Details

    • Type: Task
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 6.0, 6.1, 7.0
    • Component/s: None
    • Labels:
      None

      Description

      I'd like to move Solr away from SimpleDateFormat to Java 8's java.time.formatter.DateTimeFormatter API, particularly using simply ISO_INSTANT without any custom rules. This especially involves our DateFormatUtil class in Solr core, but also involves DateUtil (I filed SOLR-8903 to deal with additional delete/move/deprecations for that one).

      In particular, there's new Date(Instant.parse(d).toEpochMilli()) for parsing and DateTimeFormatter.ISO_INSTANT.format(d.toInstant()) for formatting. Simple & thread-safe!

      I want to simply cut over completely without having special custom rules. There are differences in how ISO_INSTANT does things:

      • Formatting: Milliseconds are 0 padded to 3 digits if the milliseconds is non-zero. Thus 30 milliseconds will have ".030" added on. Our current formatting code emits ".03".
      • Dates with years after '9999' (i.e. 10000 and beyond, >= 5 digit years): ISO_INSTANT strictly demands a leading '+' – it is formatted with a "+" and if such a year is parsed it must have a "+" or there is an exception. SimpleDateFormatter requires the opposite – no '+' and and if you tried to give it one, it would throw an exception.
      • Currently we don't support negative years (resulting in invisible errors mostly!). ISO_INSTANT supports this!

      In addition, DateFormatUtil.parseDate currently allows the trailing 'Z' to be optional, but the only caller that could exploit this is the analytics module. I'd like to remove the optional-ness of 'Z' and inline this method away to new Date(Instant.parse(d).toEpochMilli()).

      1. SOLR_8904_switch_from_SimpleDateFormat_to_Instant_parse_and_format.patch
        92 kB
        David Smiley
      2. SOLR_8904.patch
        100 kB
        David Smiley
      3. SOLR_8904.patch
        97 kB
        David Smiley

        Issue Links

          Activity

          Hide
          dsmiley David Smiley added a comment -

          Another apparent difference is that Solr currently accepts "1976-07-21T00:07:67.890Z" (notice 67 seconds!) whereas ISO_INSTANT is quite strict; it does not. I found this weird date in SimpleFacetsTest, without any comment to it's oddity.

          I'm working on a patch. In this patch, I removed DateFormatUtils and moved parseMath to DateMathParser. parseDate & formatDateExternal were one-liners of standard JDK calls so I felt no need for it. It'll make the patch bigger to digest though.

          Show
          dsmiley David Smiley added a comment - Another apparent difference is that Solr currently accepts "1976-07-21T00:07:67.890Z" (notice 67 seconds!) whereas ISO_INSTANT is quite strict; it does not. I found this weird date in SimpleFacetsTest, without any comment to it's oddity. I'm working on a patch. In this patch, I removed DateFormatUtils and moved parseMath to DateMathParser. parseDate & formatDateExternal were one-liners of standard JDK calls so I felt no need for it. It'll make the patch bigger to digest though.
          Hide
          dsmiley David Smiley added a comment -

          Another apparent distance is that Solr currently accepts 2012-1-01T12:30:00Z (notice single digit month without padding), seen from a literal string in DistributedFacetPivotLargeTest

          Show
          dsmiley David Smiley added a comment - Another apparent distance is that Solr currently accepts 2012-1-01T12:30:00Z (notice single digit month without padding), seen from a literal string in DistributedFacetPivotLargeTest
          Hide
          yseeley@gmail.com Yonik Seeley added a comment -

          1976-07-21T00:07:67.890Z" (notice 67 seconds!)

          Must have been unintended (notice that 67890 digits are all in a row... something one would type in randomly as a test number). And it's in a facets test no less!

          Any idea if there are significant performance differences?

          Show
          yseeley@gmail.com Yonik Seeley added a comment - 1976-07-21T00:07:67.890Z" (notice 67 seconds!) Must have been unintended (notice that 67890 digits are all in a row... something one would type in randomly as a test number). And it's in a facets test no less! Any idea if there are significant performance differences?
          Hide
          dsmiley David Smiley added a comment -

          FYI Instant is the new replacement for Date in Java 8. This patch is just about switching the parsing/formatting though; not to stop using Date. Instant.toString() uses DateTimeFormatter.ISO_INSTANT, so you can be pretty concise when formatting if you want to. Here's a code snippet (which I modified a test for) to build a formatted date string to send to Solr for a test specified variable number of minutes: ZonedDateTime.of(2016, 1, 1, 0, MM, 0, 0, ZoneOffset.UTC).toInstant().toString() (MM was a variable in a test method)

          In case anyone reading this missed my warning of these changes to the dev list, I did so: http://mail-archives.apache.org/mod_mbox/lucene-dev/201603.mbox/%3cCABEwPvGyqu=uDq8ULVAM1t_VX2Qg3++MOsDCYb7rcY5N=XMf4A@mail.gmail.com%3e
          No responses yet so I assume no issue with making a backwards incompatible change like this for 6.0.

          My tentative CHANGES.txt entry will be, this, under "Upgrading from Solr 5.x":

           * Date-time parsing has been made more strict (based on Java 8 java.time.DateTimeFormatter.ISO_INSTANT),
          but also now supports negative years.  AD years with >= 5 digits must have a leading '+' (formerly unsupported).
          BC years must have a '-' (formerly unsupported).  All parts of the format must be padded with zeros to the left 
          (e.g. "01" for January, never "1").  All parts must be within valid maximum boundaries (e.g. 67 seconds is now
           invalid).  Formatted times from Solr now have the milliseconds 3-digit padded if non-zero.
          

          And I'll add an entry below under "Other Changes", more succinct.

          Still running tests...

          Show
          dsmiley David Smiley added a comment - FYI Instant is the new replacement for Date in Java 8. This patch is just about switching the parsing/formatting though; not to stop using Date. Instant.toString() uses DateTimeFormatter.ISO_INSTANT, so you can be pretty concise when formatting if you want to. Here's a code snippet (which I modified a test for) to build a formatted date string to send to Solr for a test specified variable number of minutes: ZonedDateTime.of(2016, 1, 1, 0, MM, 0, 0, ZoneOffset.UTC).toInstant().toString() (MM was a variable in a test method) In case anyone reading this missed my warning of these changes to the dev list, I did so: http://mail-archives.apache.org/mod_mbox/lucene-dev/201603.mbox/%3cCABEwPvGyqu=uDq8ULVAM1t_VX2Qg3++MOsDCYb7rcY5N=XMf4A@mail.gmail.com%3e No responses yet so I assume no issue with making a backwards incompatible change like this for 6.0. My tentative CHANGES.txt entry will be, this, under "Upgrading from Solr 5.x": * Date-time parsing has been made more strict (based on Java 8 java.time.DateTimeFormatter.ISO_INSTANT), but also now supports negative years. AD years with >= 5 digits must have a leading '+' (formerly unsupported). BC years must have a '-' (formerly unsupported). All parts of the format must be padded with zeros to the left (e.g. "01" for January, never "1"). All parts must be within valid maximum boundaries (e.g. 67 seconds is now invalid). Formatted times from Solr now have the milliseconds 3-digit padded if non-zero. And I'll add an entry below under "Other Changes", more succinct. Still running tests...
          Hide
          steve_rowe Steve Rowe added a comment - - edited

          FYI, I used the new Java8 API to do random date generation in TestUseDocValuesAsStored as part of SOLR-8838: https://github.com/apache/lucene-solr/blob/master/solr/core/src/test/org/apache/solr/schema/TestUseDocValuesAsStored.java#L217-L222

          Show
          steve_rowe Steve Rowe added a comment - - edited FYI, I used the new Java8 API to do random date generation in TestUseDocValuesAsStored as part of SOLR-8838 : https://github.com/apache/lucene-solr/blob/master/solr/core/src/test/org/apache/solr/schema/TestUseDocValuesAsStored.java#L217-L222
          Hide
          dsmiley David Smiley added a comment -

          Yup I noticed it In my match (coming real soon now) I changed those couple lines of code to this more succinct version:

                    values[i] = DateTimeFormatter.ISO_INSTANT.format(Instant.ofEpochMilli(epochMillis));

          when you have the epochMillis, it's that simple. No need to use anything more complex. I could have even done:

                     values[i] = Instant.ofEpochMilli(epochMillis).toString(); 

          Show
          dsmiley David Smiley added a comment - Yup I noticed it In my match (coming real soon now ) I changed those couple lines of code to this more succinct version: values[i] = DateTimeFormatter.ISO_INSTANT.format(Instant.ofEpochMilli(epochMillis)); when you have the epochMillis, it's that simple. No need to use anything more complex. I could have even done: values[i] = Instant.ofEpochMilli(epochMillis).toString();
          Hide
          steve_rowe Steve Rowe added a comment -

          Don't remember the details, but I think I had to use DateTimeFormatter.ISO_LOCAL_DATE_TIME instead of ISO_INSTANT, something to do with millisecond roundtripping.

          Show
          steve_rowe Steve Rowe added a comment - Don't remember the details, but I think I had to use DateTimeFormatter.ISO_LOCAL_DATE_TIME instead of ISO_INSTANT, something to do with millisecond roundtripping.
          Hide
          dsmiley David Smiley added a comment -

          Here's the patch. Instead of referencing DateTimeFormatter in a bunch of places, I opted for the even more concise Instant.toString(). So DTF is only used in maybe a couple places. DateFormatUtil is gone.

          This patch also weens some callers of DateUtil to Instant.toString so long as that format was what was required based on the code. But DateUtil itself and some of its callers is being left to SOLR-8903.

          I'm running tests yet again. I had also run precommit.

          Show
          dsmiley David Smiley added a comment - Here's the patch. Instead of referencing DateTimeFormatter in a bunch of places, I opted for the even more concise Instant.toString(). So DTF is only used in maybe a couple places. DateFormatUtil is gone. This patch also weens some callers of DateUtil to Instant.toString so long as that format was what was required based on the code. But DateUtil itself and some of its callers is being left to SOLR-8903 . I'm running tests yet again. I had also run precommit.
          Hide
          dsmiley David Smiley added a comment -

          Hmm; ok I'll watch out for that. As it was I had to change it since dateTime.format(DateTimeFormatter.ISO_LOCAL_DATE_TIME) + 'Z' truncated right 0 padded milliseconds (if I recall) whereas DateTimeFormatter.ISO_INSTANT does not unless it's entirely 0.

          Show
          dsmiley David Smiley added a comment - Hmm; ok I'll watch out for that. As it was I had to change it since dateTime.format(DateTimeFormatter.ISO_LOCAL_DATE_TIME) + 'Z' truncated right 0 padded milliseconds (if I recall) whereas DateTimeFormatter.ISO_INSTANT does not unless it's entirely 0.
          Hide
          dsmiley David Smiley added a comment -

          This new patch adds parsing leniency in accordance with the Robustness Principle – "Be conservative in what you send, be liberal in what you accept". In the dev list I mentioned making this leniency settable via a System property but I am not making it toggle-able here. Perhaps it could be added later or we can let it be this way forever.

          Patch summary:

          • DateMathParser.parseMath now parses the date literal part using a new method, parseNoMath() that uses a static instance of DTF created like this: {{new DateTimeFormatterBuilder().
            .parseCaseInsensitive().parseLenient().appendInstant().toFormatter(Locale.ROOT)}}.
            • I have this method private because, in general, Solr should be parsing with "date math" from user input. When it's from Solr (e.g. SolrJ parsing dates), it can use Integer.parse which is strict. Solr is strict about formatting output.
            • parsing leniency is now explicitly tested. A leading "+" is always optional no matter how many digits are in the year. And numbers need not have leading 0 pads. However it's an error to have values out of bounds (e.g. >= 60 secs).
          • I changed ValueSourceAugmenter to call DateMathParser.parseMath (instead of without).
          • I changed all 3 spots in the analytics contrib module that parse dates to parse with math (instead of without).
          • I moved most testing in DateFieldTest into DateMathParserTest, doing a bit of consolidating/refactoring along the way. The only test that remains is testing createField. I made that test a bit more clear that TrieDateField parses with date math.

          New suggested CHANGES.txt:

           * Formatted date-times from Solr have some differences.  If the year is more
          than 4 digits, there is a leading '+'.  When there is a non-zero number of 
          milliseconds, it is padded with zeros to 3 digits.  Negative year (BC) dates are
          now possible.  It is now an error to supply a portion of the date out 
          of its, range, like 67 seconds.
          

          I think this is ready.

          Show
          dsmiley David Smiley added a comment - This new patch adds parsing leniency in accordance with the Robustness Principle – "Be conservative in what you send, be liberal in what you accept". In the dev list I mentioned making this leniency settable via a System property but I am not making it toggle-able here. Perhaps it could be added later or we can let it be this way forever. Patch summary: DateMathParser.parseMath now parses the date literal part using a new method, parseNoMath() that uses a static instance of DTF created like this: {{new DateTimeFormatterBuilder(). .parseCaseInsensitive().parseLenient().appendInstant().toFormatter(Locale.ROOT)}}. I have this method private because, in general, Solr should be parsing with "date math" from user input. When it's from Solr (e.g. SolrJ parsing dates), it can use Integer.parse which is strict. Solr is strict about formatting output. parsing leniency is now explicitly tested. A leading "+" is always optional no matter how many digits are in the year. And numbers need not have leading 0 pads. However it's an error to have values out of bounds (e.g. >= 60 secs). I changed ValueSourceAugmenter to call DateMathParser.parseMath (instead of without). I changed all 3 spots in the analytics contrib module that parse dates to parse with math (instead of without). I moved most testing in DateFieldTest into DateMathParserTest , doing a bit of consolidating/refactoring along the way. The only test that remains is testing createField. I made that test a bit more clear that TrieDateField parses with date math. New suggested CHANGES.txt: * Formatted date-times from Solr have some differences. If the year is more than 4 digits, there is a leading '+'. When there is a non-zero number of milliseconds, it is padded with zeros to 3 digits. Negative year (BC) dates are now possible. It is now an error to supply a portion of the date out of its, range, like 67 seconds. I think this is ready.
          Hide
          dsmiley David Smiley added a comment -

          Sorry; this patch now has 2 changes that were in my changelist for SOLR-8903 that better belong here. ValueSourceAugmenter [value], and ValueSourceParser ms(). That makes hits patch stand-alone.

          Show
          dsmiley David Smiley added a comment - Sorry; this patch now has 2 changes that were in my changelist for SOLR-8903 that better belong here. ValueSourceAugmenter [value] , and ValueSourceParser ms() . That makes hits patch stand-alone.
          Hide
          steve_rowe Steve Rowe added a comment - - edited

          +1, lots of nice simplification. I like how in TestUseDocValuesAsStored you expanded the random date range to include negative dates and dates much farther into the future.

          All Solr tests passed for me.

          Two nits:

          • In DateMathParser.parseMath() you carried over from DateMathUtil the obsolete commented out code p.setNow(toObject(val.substring(0,zz))); - I think it should be removed.
          • DateFieldTest has unused imports

          In DateMathParser.parseNoMath() you have a TODO to make a Date-returning temporal query - here's an (untested) one (probably should not hold up this issue though):

            /** Temporal query to convert from TemporalAccessor to Date. */
            private static Date temporalToDate(TemporalAccessor temporal) {
              long seconds = temporal.getLong(INSTANT_SECONDS);
              int nanos = temporal.get(NANO_OF_SECOND);
              if (seconds < 0 && nanos > 0) {  // from Instant.toEpochMilli()
                long millis = Math.multiplyExact(seconds + 1, 1_000);  // millis is now <= 0
                long adjustment = nanos / 1_000_000 - 1_000;           // adjustment is now negative
                return new Date(Math.addExact(millis, adjustment));
              } else {
                long millis = Math.multiplyExact(seconds, 1_000);
                return new Date(Math.addExact(millis, nanos / 1_000_000));
              }
            }
          
          Show
          steve_rowe Steve Rowe added a comment - - edited +1, lots of nice simplification. I like how in TestUseDocValuesAsStored you expanded the random date range to include negative dates and dates much farther into the future. All Solr tests passed for me. Two nits: In DateMathParser.parseMath() you carried over from DateMathUtil the obsolete commented out code p.setNow(toObject(val.substring(0,zz))); - I think it should be removed. DateFieldTest has unused imports In DateMathParser.parseNoMath() you have a TODO to make a Date-returning temporal query - here's an (untested) one (probably should not hold up this issue though): /** Temporal query to convert from TemporalAccessor to Date. */ private static Date temporalToDate(TemporalAccessor temporal) { long seconds = temporal.getLong(INSTANT_SECONDS); int nanos = temporal.get(NANO_OF_SECOND); if (seconds < 0 && nanos > 0) { // from Instant.toEpochMilli() long millis = Math .multiplyExact(seconds + 1, 1_000); // millis is now <= 0 long adjustment = nanos / 1_000_000 - 1_000; // adjustment is now negative return new Date( Math .addExact(millis, adjustment)); } else { long millis = Math .multiplyExact(seconds, 1_000); return new Date( Math .addExact(millis, nanos / 1_000_000)); } }
          Hide
          dsmiley David Smiley added a comment -

          I'm glad you like the simplifications; that was definitely part of this. The 2 nits are small enough that I'll just go commit with them done now.

          Thanks for that code snippet. As I was working on the code, I resisted taking this step because it would also require some tests, and could conceivably be combined in another issue with optimizations to avoid creating an instance of DateMathParser when there is no math to parse.

          As an aside, I have no idea how much faster (or slower?) date formatting and parsing is. I suspect faster given no need to worry about thread-safety.

          Show
          dsmiley David Smiley added a comment - I'm glad you like the simplifications; that was definitely part of this. The 2 nits are small enough that I'll just go commit with them done now. Thanks for that code snippet. As I was working on the code, I resisted taking this step because it would also require some tests, and could conceivably be combined in another issue with optimizations to avoid creating an instance of DateMathParser when there is no math to parse. As an aside, I have no idea how much faster (or slower?) date formatting and parsing is. I suspect faster given no need to worry about thread-safety.
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 94c04237cce44cac1e40e1b8b6ee6a6addc001a5 in lucene-solr's branch refs/heads/master from David Smiley
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=94c0423 ]

          SOLR-8904: switch from SimpleDateFormat to Instant.parse and format.
          [value] and ms() and contrib/analytics now call DateMathParser to parse. DateFormatUtil is now removed.

          Show
          jira-bot ASF subversion and git services added a comment - Commit 94c04237cce44cac1e40e1b8b6ee6a6addc001a5 in lucene-solr's branch refs/heads/master from David Smiley [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=94c0423 ] SOLR-8904 : switch from SimpleDateFormat to Instant.parse and format. [value] and ms() and contrib/analytics now call DateMathParser to parse. DateFormatUtil is now removed.
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 72f5eac2c5e7fb743f166fb3c1b25e73078ebdbe in lucene-solr's branch refs/heads/branch_6x from David Smiley
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=72f5eac ]

          SOLR-8904: switch from SimpleDateFormat to Instant.parse and format.
          [value] and ms() and contrib/analytics now call DateMathParser to parse. DateFormatUtil is now removed.
          (cherry picked from commit 94c0423) (cherry picked from commit 39932f5)

          Show
          jira-bot ASF subversion and git services added a comment - Commit 72f5eac2c5e7fb743f166fb3c1b25e73078ebdbe in lucene-solr's branch refs/heads/branch_6x from David Smiley [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=72f5eac ] SOLR-8904 : switch from SimpleDateFormat to Instant.parse and format. [value] and ms() and contrib/analytics now call DateMathParser to parse. DateFormatUtil is now removed. (cherry picked from commit 94c0423) (cherry picked from commit 39932f5)
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit a47a0baa829ee18b30cc391a229b96d85c865ea0 in lucene-solr's branch refs/heads/branch_6_0 from David Smiley
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=a47a0ba ]

          SOLR-8904: switch from SimpleDateFormat to Instant.parse and format.
          [value] and ms() and contrib/analytics now call DateMathParser to parse. DateFormatUtil is now removed.
          (cherry picked from commit 72f5eac)

          Show
          jira-bot ASF subversion and git services added a comment - Commit a47a0baa829ee18b30cc391a229b96d85c865ea0 in lucene-solr's branch refs/heads/branch_6_0 from David Smiley [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=a47a0ba ] SOLR-8904 : switch from SimpleDateFormat to Instant.parse and format. [value] and ms() and contrib/analytics now call DateMathParser to parse. DateFormatUtil is now removed. (cherry picked from commit 72f5eac)
          Hide
          hossman Hoss Man added a comment -

          Manually correcting fixVersion per Step #S6 of LUCENE-7271

          Show
          hossman Hoss Man added a comment - Manually correcting fixVersion per Step #S6 of LUCENE-7271

            People

            • Assignee:
              dsmiley David Smiley
              Reporter:
              dsmiley David Smiley
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development