Although technically dates are padded, it would be valuable if Flume was able to format the date components such that they were expressed like integers, eg not padded.
For example using the %y, %d or %m alias to create output directories referencing today's date like the following:
The reason this would be so helpful is when importing the data into either Hive or Impala.
First of all, Impala does not have an ability to pad partitions, so currently the only way to do this is to import the data with hive, then use Impala to access the data(well you could write custom code, however).
Second, padding partitions in hive or impala causes issues for example pruning of padded partitions is not possible.
The following is an example of a typical work flow:
Data is imported into HDFS using flume with sink as follows:
agent.sinks.snk_avro_snappy.hdfs.path = hdfs://hdfs/avro/year=%Y/month=%m/day=%d
IMPALA reads the data as follows:
create external table TestAvro (.....)
partitioned by (Year int, Month int, Day int) stored as avro
alter table TestAvro add if not exists partition(Year=cast(year(to_date(now())) as int), Month=cast(month(to_date(now())) as int), Day=cast(day(to_date(now())) as int));
Flume saves the output as
And Impala reads it as:
So this feature request is to add an ability to Flume to write data into a directory using today's date with no padding on the day or month field.
Implementation details are not important, for example could add a macro which simply removes padding, instead of futzing with the date aliases.