Uploaded image for project: 'Flume'
  1. Flume
  2. FLUME-2725

HDFS Sink does not use configured timezone for rounding

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.7.0
    • Component/s: Sinks+Sources
    • Labels:
      None

      Description

      When a BucketPath used by an HDFS sink is configured to run with some roundUnit and roundValue > 1 (e.g. 6 hours), the "roundDown" function used by BucketPath does not actually round the date correctly.

      That function calls TimestampRoundDownUtil which creates a Calendar instance using the local timezone to truncate a unix timestamp rather than the TimeZone that the sink was configured to convert dates to paths with (and that timezone is already available in the BucketPath class but it just isn't passed to TimestampRoundDownUtil).

      The net effect of this is that if a flume jvm is running on a system with an EST clock while trying to write, say, 6 hour directories in UTC time, the directories are written with the hours 04, 10, 16, 22 rather than 00, 06, 12, 18 like you would expect.

      I found a workaround for this by passing "-Duser.timezone=<hdfs_sink_timezone>" as a system property, but I wanted to create a ticket for this since it seems like it would be very minimal effort to carry that configured timezone down into the rounding utility as well.

      1. FLUME-2725-3.patch
        22 kB
        Denes Arvay
      2. FLUME-2725-2.patch
        16 kB
        Denes Arvay
      3. FLUME-2725.patch
        16 kB
        Denes Arvay

        Issue Links

          Activity

          Hide
          denes Denes Arvay added a comment -

          I've attached a patch to use the configured time zone for rounding but I'm a bit concerned about this change as it might break stuff at users who already adapted to the current behavior. Shouldn't we put this behind a configuration parameter, wdyt?

          Show
          denes Denes Arvay added a comment - I've attached a patch to use the configured time zone for rounding but I'm a bit concerned about this change as it might break stuff at users who already adapted to the current behavior. Shouldn't we put this behind a configuration parameter, wdyt?
          Hide
          mpercy Mike Percy added a comment -

          Added a minor comment to the code review. I took a quick look at the patch and it looks pretty good.

          Overall, I think a config param to maintain backcompat could be useful in some circumstances, but the fact that the rounding did not use the timezone in the first place seems like a bug to me. Overall, I don't have a strong opinion. Maybe others have a strong opinion about it.

          Show
          mpercy Mike Percy added a comment - Added a minor comment to the code review. I took a quick look at the patch and it looks pretty good. Overall, I think a config param to maintain backcompat could be useful in some circumstances, but the fact that the rounding did not use the timezone in the first place seems like a bug to me. Overall, I don't have a strong opinion. Maybe others have a strong opinion about it.
          Hide
          hshreedharan Hari Shreedharan added a comment -

          I agree it is a bug. I don't think we actually need it to be backwards compatible - it is a bug fix, we don't need bug fixes to be backwards compatible.

          Show
          hshreedharan Hari Shreedharan added a comment - I agree it is a bug. I don't think we actually need it to be backwards compatible - it is a bug fix, we don't need bug fixes to be backwards compatible.
          Hide
          denes Denes Arvay added a comment - - edited

          new patch: rebased on trunk + fixed checkstyle errors

          Show
          denes Denes Arvay added a comment - - edited new patch: rebased on trunk + fixed checkstyle errors
          Hide
          mpercy Mike Percy added a comment -

          +1. I am about to commit this.

          Show
          mpercy Mike Percy added a comment - +1. I am about to commit this.
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit ec28b66246f1f165ccaf01abf7fb27adebc9e4bb in flume's branch refs/heads/trunk from Denes Arvay
          [ https://git-wip-us.apache.org/repos/asf?p=flume.git;h=ec28b66 ]

          FLUME-2725. HDFS Sink does not use configured timezone for rounding

          (Denes Arvay via Mike Percy)

          Show
          jira-bot ASF subversion and git services added a comment - Commit ec28b66246f1f165ccaf01abf7fb27adebc9e4bb in flume's branch refs/heads/trunk from Denes Arvay [ https://git-wip-us.apache.org/repos/asf?p=flume.git;h=ec28b66 ] FLUME-2725 . HDFS Sink does not use configured timezone for rounding (Denes Arvay via Mike Percy)
          Hide
          mpercy Mike Percy added a comment -

          Pushed to trunk. Thanks for the patch, Denes!

          Show
          mpercy Mike Percy added a comment - Pushed to trunk. Thanks for the patch, Denes!
          Hide
          mpercy Mike Percy added a comment -

          Also, thank you Attila for helping to review this patch!

          Show
          mpercy Mike Percy added a comment - Also, thank you Attila for helping to review this patch!
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Flume-trunk-hbase-1 #179 (See https://builds.apache.org/job/Flume-trunk-hbase-1/179/)
          FLUME-2725. HDFS Sink does not use configured timezone for rounding (mpercy: http://git-wip-us.apache.org/repos/asf/flume/repo?p=flume.git&a=commit&h=ec28b66246f1f165ccaf01abf7fb27adebc9e4bb)

          • flume-ng-core/src/test/java/org/apache/flume/tools/TestTimestampRoundDownUtil.java
          • flume-ng-core/src/main/java/org/apache/flume/tools/TimestampRoundDownUtil.java
          • flume-ng-core/src/test/java/org/apache/flume/formatter/output/TestBucketPath.java
          • flume-ng-core/src/main/java/org/apache/flume/formatter/output/BucketPath.java
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Flume-trunk-hbase-1 #179 (See https://builds.apache.org/job/Flume-trunk-hbase-1/179/ ) FLUME-2725 . HDFS Sink does not use configured timezone for rounding (mpercy: http://git-wip-us.apache.org/repos/asf/flume/repo?p=flume.git&a=commit&h=ec28b66246f1f165ccaf01abf7fb27adebc9e4bb ) flume-ng-core/src/test/java/org/apache/flume/tools/TestTimestampRoundDownUtil.java flume-ng-core/src/main/java/org/apache/flume/tools/TimestampRoundDownUtil.java flume-ng-core/src/test/java/org/apache/flume/formatter/output/TestBucketPath.java flume-ng-core/src/main/java/org/apache/flume/formatter/output/BucketPath.java

            People

            • Assignee:
              denes Denes Arvay
              Reporter:
              eczech Eric Czech
            • Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development