This enhancement would also be useful to minimize the wasted space when writing to HDFS to ensure that the resulting files are at or just under the HDFS block size.
Just fyi, there is no (block) space waste if data is below a block size. HDFS does not burn the entire block size on the DN if the block data doesn't occupy the complete block. In other words, a 4K file in HDFS contains a 4K block and occupies 4K on disk and a 65MB file is made up of a "full" 64MB block and a 1MB block, and occupies 65MB on disk.
Maybe you're referring to space waste in the NN memory? If that's the case, that's the same as what the description means by "during the light load periods would end up creating lots of small files." This is a degenerate situation and should be avoided.
For our usage it might actually be optimal to have a size trigger with an optional time trigger in addition (ie roll over when the size trigger is hit or after it fails to be hit after the configured time).
This is usually what people really want when they think about it. +1.