Uploaded image for project: 'Crunch (Retired)'
  1. Crunch (Retired)
  2. CRUNCH-165

Pipelines should automatically use CombineFileInputFormat where input consists of many small files

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 0.4.0
    • 0.8.0
    • Core
    • None

    Description

      Hive had a feature introduced in HIVE-74 whereby CombineFileInputFormat would be used if the input data consisted of many small files, making the resulting mapreduce jobs more efficient by giving individual mappers more data to process. This would be a nice feature for Crunch to have, too.

      Attachments

        1. CRUNCH-165-v4.patch
          25 kB
          Josh Wills
        2. CRUNCH-165-v3.patch
          9 kB
          Josh Wills
        3. CRUNCH-165-jwills.patch
          25 kB
          Josh Wills
        4. CRUNCH-165.patch
          8 kB
          Joseph Adler

        Activity

          People

            jwills Josh Wills
            dbeech Dave Beech
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: