In our production env, we generate avro files to track some user behavior events. Every hour, we will have several avro files created. And daily, we will run MR to do analysis, when using AvroKeyValueInputFormat, a lot of small mappers started due to we have small avro files.
A combine file inputformat will be very helpful for such case.
Hadoop already provided some implementation for sequencefile and text file. This Jira is propose a CombineAvroKeyValueFileInputFormat class to implement the same for avro keyvalue files.