Details

    • Type: New Feature
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 1.5.1
    • Fix Version/s: 1.6.0
    • Component/s: Configuration
    • Labels:
      None

      Description

      Although technically dates are padded, it would be valuable if Flume was able to format the date components such that they were expressed like integers, eg not padded.

      For example using the %y, %d or %m alias to create output directories referencing today's date like the following:
      /output/2014/3/5/

      The reason this would be so helpful is when importing the data into either Hive or Impala.

      First of all, Impala does not have an ability to pad partitions, so currently the only way to do this is to import the data with hive, then use Impala to access the data(well you could write custom code, however).

      Second, padding partitions in hive or impala causes issues for example pruning of padded partitions is not possible.

      The following is an example of a typical work flow:

      Data is imported into HDFS using flume with sink as follows:
      agent.sinks.snk_avro_snappy.hdfs.path = hdfs://hdfs/avro/year=%Y/month=%m/day=%d

      IMPALA reads the data as follows:
      create external table TestAvro (.....)
      partitioned by (Year int, Month int, Day int) stored as avro
      location '/avro';
      alter table TestAvro add if not exists partition(Year=cast(year(to_date(now())) as int), Month=cast(month(to_date(now())) as int), Day=cast(day(to_date(now())) as int));

      Flume saves the output as
      hdfs://hdfs/avro/year=2014/month=12/day=01

      And Impala reads it as:
      hdfs://hdfs/avro/year=2014/month=12/day=1

      So this feature request is to add an ability to Flume to write data into a directory using today's date with no padding on the day or month field.
      Implementation details are not important, for example could add a macro which simply removes padding, instead of futzing with the date aliases.

      1. Flume-2570.patch
        4 kB
        Peter Leckie
      2. patch
        4 kB
        Peter Leckie

        Activity

        Hide
        gwenshap Gwen Shapira added a comment -

        Peter Leckie: If I understand the request correctly, you'd like to add a flag for unpadded-day and unpadded-month?

        Show
        gwenshap Gwen Shapira added a comment - Peter Leckie : If I understand the request correctly, you'd like to add a flag for unpadded-day and unpadded-month?
        Hide
        petel Peter Leckie added a comment -

        Gwen Shapira Yes, so some form of new alias(escape sequences), flag or pad stripping macro that would allow the existing definition:
        hdfs://hdfs/avro/year=%Y/month=%m/day=%d

        To translate into:
        hdfs://hdfs/avro/year=2014/month=8/day=1

        For simplicity of use, my preference would be 2 new aliases, for example:
        hdfs://hdfs/avro/year=%Y/month=%u/day=%v

        Where:
        %u non padded month (1..12)
        %v non padded day of month (1..31)

        Show
        petel Peter Leckie added a comment - Gwen Shapira Yes, so some form of new alias(escape sequences), flag or pad stripping macro that would allow the existing definition: hdfs://hdfs/avro/year=%Y/month=%m/day=%d To translate into: hdfs://hdfs/avro/year=2014/month=8/day=1 For simplicity of use, my preference would be 2 new aliases, for example: hdfs://hdfs/avro/year=%Y/month=%u/day=%v Where: %u non padded month (1..12) %v non padded day of month (1..31)
        Hide
        hshreedharan Hari Shreedharan added a comment -

        Hi Peter Leckie - Could you also add a test case for this? Thanks!

        Show
        hshreedharan Hari Shreedharan added a comment - Hi Peter Leckie - Could you also add a test case for this? Thanks!
        Hide
        petel Peter Leckie added a comment -

        Chang for implementation of non padded dates.
        New options are %e for day and %n for month.
        Includes test case testNoPadding() which confirms %e&n follow the same logic as SimpleDateFormat()

        Show
        petel Peter Leckie added a comment - Chang for implementation of non padded dates. New options are %e for day and %n for month. Includes test case testNoPadding() which confirms %e&n follow the same logic as SimpleDateFormat()
        Hide
        hshreedharan Hari Shreedharan added a comment -

        +1. Looks good, committing.

        Show
        hshreedharan Hari Shreedharan added a comment - +1. Looks good, committing.
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit 3d03053615694ca638e5ddf314081826b8a5f1ac in flume's branch refs/heads/trunk from Hari Shreedharan
        [ https://git-wip-us.apache.org/repos/asf?p=flume.git;h=3d03053 ]

        FLUME-2570. Add option to not pad date fields.

        (Peter Leckie via Hari)

        Show
        jira-bot ASF subversion and git services added a comment - Commit 3d03053615694ca638e5ddf314081826b8a5f1ac in flume's branch refs/heads/trunk from Hari Shreedharan [ https://git-wip-us.apache.org/repos/asf?p=flume.git;h=3d03053 ] FLUME-2570 . Add option to not pad date fields. (Peter Leckie via Hari)
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit fa7ead55c34fe43aa4afd160ca40c61b5fb6be8d in flume's branch refs/heads/flume-1.6 from Hari Shreedharan
        [ https://git-wip-us.apache.org/repos/asf?p=flume.git;h=fa7ead5 ]

        FLUME-2570. Add option to not pad date fields.

        (Peter Leckie via Hari)

        Show
        jira-bot ASF subversion and git services added a comment - Commit fa7ead55c34fe43aa4afd160ca40c61b5fb6be8d in flume's branch refs/heads/flume-1.6 from Hari Shreedharan [ https://git-wip-us.apache.org/repos/asf?p=flume.git;h=fa7ead5 ] FLUME-2570 . Add option to not pad date fields. (Peter Leckie via Hari)
        Hide
        hshreedharan Hari Shreedharan added a comment -

        Committed! Thanks Peter!

        Show
        hshreedharan Hari Shreedharan added a comment - Committed! Thanks Peter!
        Hide
        hudson Hudson added a comment -

        UNSTABLE: Integrated in flume-trunk #718 (See https://builds.apache.org/job/flume-trunk/718/)
        FLUME-2570. Add option to not pad date fields. (hshreedharan: http://git-wip-us.apache.org/repos/asf/flume/repo?p=flume.git&a=commit&h=3d03053615694ca638e5ddf314081826b8a5f1ac)

        • flume-ng-doc/sphinx/FlumeUserGuide.rst
        • flume-ng-core/src/main/java/org/apache/flume/formatter/output/BucketPath.java
        • flume-ng-core/src/test/java/org/apache/flume/formatter/output/TestBucketPath.java
        Show
        hudson Hudson added a comment - UNSTABLE: Integrated in flume-trunk #718 (See https://builds.apache.org/job/flume-trunk/718/ ) FLUME-2570 . Add option to not pad date fields. (hshreedharan: http://git-wip-us.apache.org/repos/asf/flume/repo?p=flume.git&a=commit&h=3d03053615694ca638e5ddf314081826b8a5f1ac ) flume-ng-doc/sphinx/FlumeUserGuide.rst flume-ng-core/src/main/java/org/apache/flume/formatter/output/BucketPath.java flume-ng-core/src/test/java/org/apache/flume/formatter/output/TestBucketPath.java
        Hide
        hudson Hudson added a comment -

        SUCCESS: Integrated in Flume-trunk-hbase-98 #75 (See https://builds.apache.org/job/Flume-trunk-hbase-98/75/)
        FLUME-2570. Add option to not pad date fields. (hshreedharan: http://git-wip-us.apache.org/repos/asf/flume/repo?p=flume.git&a=commit&h=3d03053615694ca638e5ddf314081826b8a5f1ac)

        • flume-ng-core/src/main/java/org/apache/flume/formatter/output/BucketPath.java
        • flume-ng-core/src/test/java/org/apache/flume/formatter/output/TestBucketPath.java
        • flume-ng-doc/sphinx/FlumeUserGuide.rst
        Show
        hudson Hudson added a comment - SUCCESS: Integrated in Flume-trunk-hbase-98 #75 (See https://builds.apache.org/job/Flume-trunk-hbase-98/75/ ) FLUME-2570 . Add option to not pad date fields. (hshreedharan: http://git-wip-us.apache.org/repos/asf/flume/repo?p=flume.git&a=commit&h=3d03053615694ca638e5ddf314081826b8a5f1ac ) flume-ng-core/src/main/java/org/apache/flume/formatter/output/BucketPath.java flume-ng-core/src/test/java/org/apache/flume/formatter/output/TestBucketPath.java flume-ng-doc/sphinx/FlumeUserGuide.rst

          People

          • Assignee:
            petel Peter Leckie
            Reporter:
            petel Peter Leckie
          • Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development