Uploaded image for project: 'Hadoop Map/Reduce'
  1. Hadoop Map/Reduce
  2. MAPREDUCE-640

Make the Reader for sampling TeraSort input multithreaded

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • examples
    • None

    Description

      The TeraSort sampler that reads from multiple splits to come up with the partition information can be made multi-threaded, where multiple threads would read from multiple splits concurrently. That should lead to better performance and also we could attempt at sampling more records to arrive at a better partition info.

      Attachments

        1. 4946.patch
          4 kB
          Devaraj Das

        Activity

          People

            ddas Devaraj Das
            ddas Devaraj Das
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated: