Uploaded image for project: 'Apache Avro'
  1. Apache Avro
  2. AVRO-2354

Add CombineAvroKeyValueFileInputFormat in avro-mapred to combine small avro keyvalue files into combineSplit

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.9.0
    • Component/s: java
    • Labels:
      None

      Description

      In our production env, we generate avro files to track some user behavior events. Every hour, we will have several avro files created. And daily, we will run MR to do analysis, when using AvroKeyValueInputFormat, a lot of small mappers started due to we have small avro files. 

      A combine file inputformat will be very helpful for such case. 

      Hadoop already provided some implementation for sequencefile and text file. This Jira is propose a CombineAvroKeyValueFileInputFormat class to implement the same for avro keyvalue files.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                dkulp Daniel Kulp
                Reporter:
                suxingfate Wang, Xinglong
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: