Uploaded image for project: 'Flume'
  1. Flume
  2. FLUME-2570

Add option to not pad date fields

    Details

    • Type: New Feature
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 1.5.1
    • Fix Version/s: 1.6.0
    • Component/s: Configuration
    • Labels:
      None

      Description

      Although technically dates are padded, it would be valuable if Flume was able to format the date components such that they were expressed like integers, eg not padded.

      For example using the %y, %d or %m alias to create output directories referencing today's date like the following:
      /output/2014/3/5/

      The reason this would be so helpful is when importing the data into either Hive or Impala.

      First of all, Impala does not have an ability to pad partitions, so currently the only way to do this is to import the data with hive, then use Impala to access the data(well you could write custom code, however).

      Second, padding partitions in hive or impala causes issues for example pruning of padded partitions is not possible.

      The following is an example of a typical work flow:

      Data is imported into HDFS using flume with sink as follows:
      agent.sinks.snk_avro_snappy.hdfs.path = hdfs://hdfs/avro/year=%Y/month=%m/day=%d

      IMPALA reads the data as follows:
      create external table TestAvro (.....)
      partitioned by (Year int, Month int, Day int) stored as avro
      location '/avro';
      alter table TestAvro add if not exists partition(Year=cast(year(to_date(now())) as int), Month=cast(month(to_date(now())) as int), Day=cast(day(to_date(now())) as int));

      Flume saves the output as
      hdfs://hdfs/avro/year=2014/month=12/day=01

      And Impala reads it as:
      hdfs://hdfs/avro/year=2014/month=12/day=1

      So this feature request is to add an ability to Flume to write data into a directory using today's date with no padding on the day or month field.
      Implementation details are not important, for example could add a macro which simply removes padding, instead of futzing with the date aliases.

        Attachments

        1. Flume-2570.patch
          4 kB
          Peter Leckie
        2. patch
          4 kB
          Peter Leckie

          Activity

            People

            • Assignee:
              petel Peter Leckie
              Reporter:
              petel Peter Leckie
            • Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: