Uploaded image for project: 'Ignite'
  1. Ignite
  2. IGNITE-4262 Hadoop Accelerator performance improvements.
  3. IGNITE-4270

Hadoop: optionally stripe mapper output for every partition.

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 1.8
    • 2.0
    • hadoop

    Description

      Currently we have R maps for M mappers, where R is number of reducers. For this reason many mappers writes to concurrent offheap data structure, loosing time on concurrency burden.
      Let's add an option to create R * M maps, so that every mapper has dedicated map for every reducer. This will eliminate almost all concurrency overhead.

      Design:
      1) Every mapper works with it's own set of "remote" output maps;
      2) These maps are essentially not "maps", but IO messages, which we fill up to certain threshold;
      3) Once filled, message is sent to remote node.
      4) Async shuffle thread is no longer need in this architecture.

      As a result we decrease concurrency, removes slowdown from a single shuffle thread which is not able to send messages fast enough, and removes unnecessary intermediate sorting.

      NB! Be careful with "combiner" case and with "external" execution.

      Attachments

        Issue Links

          Activity

            People

              vozerov Vladimir Ozerov
              vozerov Vladimir Ozerov
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: