Details
Description
I am trying to group my data and store in hdfs with a folder for each 'name' and subfolders for each 'YearMonth' under each name folder.
Input:
(Date) (name) (col3) (col4)
2015-02-02 abc y z
2016-01-02 xyz i j
2015-03-02 abc f b
2015-02-06 abc y z
2016-03-02 xyz a q
Expected out in hdfs:
abc folder
->201502 subfolder
2015-02-02 abc y z
2015-02-06 abc y z
->201503 subfolder
2015-03-02 abc f b
xyz folder
->201601
2016-01-02 xyz i j
->201603
2016-03-02 xyz a q
I am not sure of how to use the Multistorage option on Name column after grouping the tuples by date.
Any help is appreciated.