Uploaded image for project: 'Hama'
  1. Hama
  2. HAMA-757

The partitioning job output should be un-splitable

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 0.6.1
    • 0.6.2
    • bsp core
    • None

    Description

      When the output sequence files from partitioning job are large(bigger than two hdfs file block size), the second round of the job (using these sequence file as input) will start up more tasks than client want. Some times, this uncertainty make the job exceed the cluster slot capacity.
      In the real project, I implemented an new Inputformat which marked as un-splitable to solve the problem. Is there any better way?

      Attachments

        1. HAMA-757.patch
          2 kB
          MaoYuan Xian

        Activity

          People

            kennethxian MaoYuan Xian
            kennethxian MaoYuan Xian
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: