Pig
  1. Pig
  2. PIG-1781

Piggybank: ISOToDay disregards timezone (should use ISODateTimeFormat instead of DateTime to parse)

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.8.0
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None
    • Release Note:
      Hide
      in UDFs in org.apache.pig.piggybank.test.evaluation.datetime.truncate, add support for non-UTC timezones in ISO 8601 datetime strings,


      Show
      in UDFs in org.apache.pig.piggybank.test.evaluation.datetime.truncate, add support for non-UTC timezones in ISO 8601 datetime strings,

      Description

      (Apologies if this is the wrong place to file Piggybank bugs)

      Bug in http://svn.apache.org/viewvc/pig/trunk/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/evaluation/datetime/truncate/ISOToDay.java?view=markup

      and other http://svn.apache.org/viewvc/pig/trunk/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/evaluation/datetime/truncate/ classes that copy-paste the same code.

      These classes parse dates like so:
      DateTimeZone.setDefault(DateTimeZone.UTC);
      DateTime dt = new DateTime((String)input.get(0).toString());

      This has two problems:
      (1) It messes up JVM static state by changing the DateTimeZone default time zone.
      (2) It ignore timezone information in the input string, so times like "2009-12-09T23:59:59-0800" get truncated to "2009-12-10T00:00:00Z", which is the wrong day of year.

      Instead, they should use something like this, which respects the input timezone and does not modify any global state:

      DateTime dt ISODateTimeFormat.dateTime().withOffsetParsed().parseDateTime(isoDateString);

      I have not provided a patch, because I'm not really set up to hack on Piggybank locally.

      As a workaround, I am copy-pasting the classes into my own packages, and making the desired change.

      1. ASF.LICENSE.NOT.GRANTED--PIG-1781.4.patch
        27 kB
        Michael Brauwerman
      2. ASF.LICENSE.NOT.GRANTED--PIG-1781.3.patch
        27 kB
        Michael Brauwerman
      3. PIG-1781.2.patch
        22 kB
        Michael Brauwerman

        Activity

        Hide
        Michael Brauwerman added a comment -

        This patch file was generated by "cd contrib/piggybank/java && svn diff". I do not know if that is the desired format.

        This is a patch against trunk.

        Notes on the code:

        • Tests included for logic change, new feature (non-UTC support), and new class's methods.
        • In addition to fixing the bug, I refactored a bit of common code into a new class ISOHelper.
        • Whitespace is a bit funky, because I made the changes in XCode, not Eclipse.
        • It might work well against 0.8.0 as well. At a glance, I didn't see code changes in the affected classes.
        • Example code (in the comments) was not updated to include examples of non-UTC usage.
        Show
        Michael Brauwerman added a comment - This patch file was generated by "cd contrib/piggybank/java && svn diff". I do not know if that is the desired format. This is a patch against trunk. Notes on the code: Tests included for logic change, new feature (non-UTC support), and new class's methods. In addition to fixing the bug, I refactored a bit of common code into a new class ISOHelper. Whitespace is a bit funky, because I made the changes in XCode, not Eclipse. It might work well against 0.8.0 as well. At a glance, I didn't see code changes in the affected classes. Example code (in the comments) was not updated to include examples of non-UTC usage.
        Hide
        Michael Brauwerman added a comment -

        This is the patch I intended to submit along with my previous submissionn(the "Submit Patch" Workflow Action).

        Show
        Michael Brauwerman added a comment - This is the patch I intended to submit along with my previous submissionn(the "Submit Patch" Workflow Action).
        Hide
        Michael Brauwerman added a comment -

        (Cancelling patch, because I read http://wiki.apache.org/pig/HowToContribute and saw that "Submit Patch" is for a more formal submission. I apologize.)

        Show
        Michael Brauwerman added a comment - (Cancelling patch, because I read http://wiki.apache.org/pig/HowToContribute and saw that "Submit Patch" is for a more formal submission. I apologize.)
        Hide
        Michael Brauwerman added a comment -

        Re-uploading patch with ".patch" extension.

        Show
        Michael Brauwerman added a comment - Re-uploading patch with ".patch" extension.
        Hide
        Michael Brauwerman added a comment -

        Re-Submitting Patch after re-uploading patch with proper filename.

        Show
        Michael Brauwerman added a comment - Re-Submitting Patch after re-uploading patch with proper filename.
        Hide
        Alan Gates added a comment -

        One quick comment: our coding conventions are to use spaces and not tabs. See http://wiki.apache.org/pig/HowToContribute where it talks about our coding conventions (search on Sun to find it).

        FYI most of us are off for Christmas after today, so it may be after New Years before someone gets around to reviewing this.

        Russell, did you want to take a look at this since you did a lot of the original ISO work?

        Show
        Alan Gates added a comment - One quick comment: our coding conventions are to use spaces and not tabs. See http://wiki.apache.org/pig/HowToContribute where it talks about our coding conventions (search on Sun to find it). FYI most of us are off for Christmas after today, so it may be after New Years before someone gets around to reviewing this. Russell, did you want to take a look at this since you did a lot of the original ISO work?
        Hide
        Michael Brauwerman added a comment -

        Updated patch, created at trunk/contrib/piggybank.
        'ant test' ran to BUILD SUCCESSFUL

        New Changes in this patch:

        • Re-indented all touched files to use 4-spaces instead of tabs.
        • Removed some stray System.out.println lines in some tests
        • Changed some assertTrue(foo.equals(bar)) to assertEquals(foo, bar) (for clearer output when tests fail.)

        Changes between repo and this patch, that I forgot to mention earlier:

        • Changed some assertTrue(foo.equals(bar)) to assertEquals(foo, bar) (for clearer output when tests fail.)

        Concerns:

        • piggybank does not have a 'doc' or 'test-patch' target, so I could not run those. Particularly, I couldn't run the "hudson auto-review" locally.
        • I'm not 100% on the interaction between core pig and piggybank in the build files, so I hope I did not break or overlook anything in that interaction.

        I got Eclipse set up to develop on Pig, which is much nicer than Xcode. Yay.

        OK, I hope this is helpful! It would be swell if this change got merged the 0.8 branch also (if an 0.8.1 release is planned), but I'm in no personal rush, since I have a local build that works for my project.

        Oh, and if this behavior change is undesired for some reason, no hard feelings. I don't know if anyone else is using Piggybank for bucketing-by-day analysis in non-UTC timezones, but I am, and so the truncate package is quite handy for me; I just need a version that works on US/Pacific-timezone dateTimes.

        Show
        Michael Brauwerman added a comment - Updated patch, created at trunk/contrib/piggybank. 'ant test' ran to BUILD SUCCESSFUL New Changes in this patch: Re-indented all touched files to use 4-spaces instead of tabs. Removed some stray System.out.println lines in some tests Changed some assertTrue(foo.equals(bar)) to assertEquals(foo, bar) (for clearer output when tests fail.) Changes between repo and this patch, that I forgot to mention earlier: Changed some assertTrue(foo.equals(bar)) to assertEquals(foo, bar) (for clearer output when tests fail.) Concerns: piggybank does not have a 'doc' or 'test-patch' target, so I could not run those. Particularly, I couldn't run the "hudson auto-review" locally. I'm not 100% on the interaction between core pig and piggybank in the build files, so I hope I did not break or overlook anything in that interaction. I got Eclipse set up to develop on Pig, which is much nicer than Xcode. Yay. OK, I hope this is helpful! It would be swell if this change got merged the 0.8 branch also (if an 0.8.1 release is planned), but I'm in no personal rush, since I have a local build that works for my project. Oh, and if this behavior change is undesired for some reason, no hard feelings. I don't know if anyone else is using Piggybank for bucketing-by-day analysis in non-UTC timezones, but I am, and so the truncate package is quite handy for me; I just need a version that works on US/Pacific-timezone dateTimes.
        Hide
        Dmitriy V. Ryaboy added a comment -

        Michael,
        Thanks a lot for contributing, I think this is a useful feature.

        We are not planning a 0.8.1 at the moment, but it's possible we will separate piggybank from Pig releases, so this would become available before 0.9.

        Overall the patch looks good, but I think that you still want to set the default time zone to be UTC, not whatever the system default is. So you will want to put the DateTimeZone.setDefault(DateTimeZone.UTC) call into the new helper class.

        Also, fyi, we generally generate patches, even for piggybank, from the top-level directory, the one that has contrib/ in it.

        No need to apologize for not getting the patch submission process quite right, we are not above helping with that part, and would much rather someone submit a useful patch that's not quite formatted right, than not get the contribution .

        Show
        Dmitriy V. Ryaboy added a comment - Michael, Thanks a lot for contributing, I think this is a useful feature. We are not planning a 0.8.1 at the moment, but it's possible we will separate piggybank from Pig releases, so this would become available before 0.9. Overall the patch looks good, but I think that you still want to set the default time zone to be UTC, not whatever the system default is. So you will want to put the DateTimeZone.setDefault(DateTimeZone.UTC) call into the new helper class. Also, fyi, we generally generate patches, even for piggybank, from the top-level directory, the one that has contrib/ in it. No need to apologize for not getting the patch submission process quite right, we are not above helping with that part, and would much rather someone submit a useful patch that's not quite formatted right, than not get the contribution .
        Hide
        Michael Brauwerman added a comment -

        New patch PIG-1781.3.patch, supersedes PIG-1781.2.patch

        Changes between PIG-1781.2.patch and PIG-1781.3.patch:

        • Changed ISOHelper's dateTime() to dateTimeParser(), which more closely matches the pre-patch behavior of accepting a date or time or dateTime.
        • Renamed parseDate() to parseDateTime(), and renamed corresponding tests.
        • Added a public constant ISOHelper#DEFAULT_DATE_TIME_ZONE to advertise the fact that ISOHelper has its own default time zone for parsing.
        • Added code to ISOHelper#parseDateTime to use UTC as default time zone for parsing ambiguous dates.
        • Added code to save/restore the System's default time zone. *This is a change in behavior from the pre-patch code.**
        • Added unit tests:
          • to illustrate various corner cases of date/time/dateTime and UTC/other-time-zone/no-time-zone parsing
          • to illustrate how default time zones are managed.
        • ran 'svn diff' at base of pig tree.

        "ant clean test" in contrib directory succeeded.

        I hope this version addresses everyone's concerns, but let me know if more changes are needed.

        Regarding the general issue of time zones. I see the options for behavior like so:
        (A) Use the System default time zone;
        (B) Use ISOHelper's preferred time zone (UTC) to parse ambiguous times, but don't touch System's default time zone;
        (C) Use ISOHelper's preferred time zone (UTC) to parse ambiguous times, and mutate the System time zone to match.
        (D) Let the Pig User set a time zone for parsing ambiguous dates, independently of the System default time zone, with a parser default to either UTC or System's default.

        The pre-patch 0.8.0 code behaves as option (C). This patch changes the behavior to be option (B) (with a tiny, unavoidable, race condition). (I think option (D) would be best, but that is more work that no one has requested yet.)

        Let me know if you disagree with my choice.

        (I think the best practice for Pig Users to assign timezones as early as possible in the data processing pipeline, and keep dates tagged with time zones throughout the pipeline.)

        Show
        Michael Brauwerman added a comment - New patch PIG-1781 .3.patch, supersedes PIG-1781 .2.patch Changes between PIG-1781 .2.patch and PIG-1781 .3.patch: Changed ISOHelper's dateTime() to dateTimeParser(), which more closely matches the pre-patch behavior of accepting a date or time or dateTime. Renamed parseDate() to parseDateTime(), and renamed corresponding tests. Added a public constant ISOHelper#DEFAULT_DATE_TIME_ZONE to advertise the fact that ISOHelper has its own default time zone for parsing. Added code to ISOHelper#parseDateTime to use UTC as default time zone for parsing ambiguous dates. Added code to save/restore the System's default time zone. * This is a change in behavior from the pre-patch code. ** Added unit tests: to illustrate various corner cases of date/time/dateTime and UTC/other-time-zone/no-time-zone parsing to illustrate how default time zones are managed. ran 'svn diff' at base of pig tree. "ant clean test" in contrib directory succeeded. I hope this version addresses everyone's concerns, but let me know if more changes are needed. Regarding the general issue of time zones. I see the options for behavior like so: (A) Use the System default time zone; (B) Use ISOHelper's preferred time zone (UTC) to parse ambiguous times, but don't touch System's default time zone; (C) Use ISOHelper's preferred time zone (UTC) to parse ambiguous times, and mutate the System time zone to match. (D) Let the Pig User set a time zone for parsing ambiguous dates, independently of the System default time zone, with a parser default to either UTC or System's default. The pre-patch 0.8.0 code behaves as option (C). This patch changes the behavior to be option (B) (with a tiny, unavoidable, race condition). (I think option (D) would be best, but that is more work that no one has requested yet.) Let me know if you disagree with my choice. (I think the best practice for Pig Users to assign timezones as early as possible in the data processing pipeline, and keep dates tagged with time zones throughout the pipeline.)
        Hide
        Michael Brauwerman added a comment -

        This patch supersedes PIG-1781.3.patch

        I had an unsaved change in Eclipse, so the previous patch was missing some code.

        Show
        Michael Brauwerman added a comment - This patch supersedes PIG-1781 .3.patch I had an unsaved change in Eclipse, so the previous patch was missing some code.
        Hide
        Alan Gates added a comment -

        I ran the contrib tests (including the ones added in this patch) and they all pass.

        Dmitriy, Russell, any comments or concerns before I check this in?

        Show
        Alan Gates added a comment - I ran the contrib tests (including the ones added in this patch) and they all pass. Dmitriy, Russell, any comments or concerns before I check this in?
        Hide
        Dmitriy V. Ryaboy added a comment -

        +1

        Show
        Dmitriy V. Ryaboy added a comment - +1
        Hide
        Alan Gates added a comment -

        Patch checked in. Thank you Michael.

        Show
        Alan Gates added a comment - Patch checked in. Thank you Michael.

          People

          • Assignee:
            Michael Brauwerman
            Reporter:
            Michael Brauwerman
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development