Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-93

allow minimum split size configurable

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 0.1.0
    • 0.1.0
    • None
    • None

    Description

      The current default split size is the size of a block (32M) and a SequenceFile sets it to be SequenceFile.SYNC_INTERVAL(2K). We currently have a Map/Reduce application working on crawled docuements. Its input data consists of 356 sequence files, each of which is of a size around 30G. A jobtracker takes forever to launch the job because it needs to generate 356*30G/2K map tasks!

      The proposed solution is to let the minimum split size configurable so that the programmer can control the number of tasks to generate.

      Attachments

        1. hadoop-93.fix
          2 kB
          Hairong Kuang

        Activity

          People

            cutting Doug Cutting
            hairong Hairong Kuang
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: