Uploaded image for project: 'Mahout'
  1. Mahout
  2. MAHOUT-1042

Hotspot in RecommenderJob-PartialMultiplyMapper-Reducer

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: 0.6, 0.7
    • Fix Version/s: 0.8
    • Labels:
      None

      Description

      While profiling PartialMultiplyMapper-Reducer job we noticed a hotspot consuming more than 40% of the CPU time in org.apache.mahout.math.RandomAccessSparseVector.assign method for the reducer task. We used the script provided in mahout examples for running ASF Email recommendations for profiling. The hotspot is coming from the use of Vector.plus(Vector x) method in AggregateAndRecommendReducerc class. The pattern used is VectorA = VectorA.plus(VectorB). In this case VectorA doesn't have to be cloned using assign method. The attached patch addresses the hotspot by eliminating cloning in the above case for plus and times methods. This patch while retaining functionality (verified the output with and without patch), speeds up execution time of PartialMultiplyMapper-Reducer job by more than 10X on x86 architectures.

        Attachments

        1. Mahout_1042.patch
          5 kB
          Bhaskar Devireddy
        2. MAHOUT-1042.patch
          6 kB
          Sebastian Schelter

          Activity

            People

            • Assignee:
              ssc Sebastian Schelter
              Reporter:
              bhaskar.devireddy Bhaskar Devireddy
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: