Uploaded image for project: 'Hadoop Map/Reduce'
  1. Hadoop Map/Reduce
  2. MAPREDUCE-300

Ability to thread task execution

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Not A Problem
    • None
    • None
    • None
    • None
    • All

    Description

      Currently Hadoop spawns a single threaded JVM for each task. While good for many tasks, this does not maximize resource usage for slaves that have many cores (machines with more cores are getting more cost effective everyday) and are running jobs that require many gigabytes of read-only in-memory resources to maximize throughput. Running in separate JVMs requires redundantly loading large amounts of data, reducing the possible number of parallel tasks that can run per a machine even though more cpus are available.

      Adding this ability will give hadoop users the flexibility to balance their need for maximizing memory usage & throughput and task segmentation.

      Note: This is a blocking bug in porting processes over to hadoop for my own organization. I am testing a patch for this now that leaves the existing behavior for single threaded operation in-tact. All synchronization is done through wrapper classes and helper methods and should not add any overhead to non-threaded processes.

      Attachments

        Activity

          People

            Unassigned Unassigned
            madcow Holden Robbins
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

                Estimated:
                Original Estimate - 48h
                48h
                Remaining:
                Remaining Estimate - 48h
                48h
                Logged:
                Time Spent - Not Specified
                Not Specified