Uploaded image for project: 'Mahout'
  1. Mahout
  2. MAHOUT-475

Replace Mapper with MultithreadedMapper to run job pairwiseSimilarity

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Not A Problem
    • 0.4
    • 0.4
    • None

    Description

      Because CooccurrencesMapper has huge computing,
      Maybe we can replace Mapper with MultithreadedMapper.

      And call the mapper
      original:

          if (shouldRunNextPhase(parsedArgs, currentPhase)) {
            Job pairwiseSimilarity = prepareJob(weightsPath,
                                     pairwiseSimilarityPath,
                                     SequenceFileInputFormat.class,
                                     CooccurrencesMapper.class,
                                     WeightedRowPair.class,
                                     Cooccurrence.class,
                                     SimilarityReducer.class,
                                     SimilarityMatrixEntryKey.class,
                                     MatrixEntryWritable.class,
                                     SequenceFileOutputFormat.class);
      
            Configuration pairwiseConf = pairwiseSimilarity.getConfiguration();
            pairwiseConf.set(DISTRIBUTED_SIMILARITY_CLASSNAME, distributedSimilarityClassname);
            pairwiseConf.setInt(NUMBER_OF_COLUMNS, numberOfColumns);
            pairwiseSimilarity.waitForCompletion(true);
          }
      

      new:

          if (shouldRunNextPhase(parsedArgs, currentPhase)) {
            Job pairwiseSimilarity = prepareJob(weightsPath,
                                     pairwiseSimilarityPath,
                                     SequenceFileInputFormat.class,
                                     CooccurrencesMapper.class,
                                     WeightedRowPair.class,
                                     Cooccurrence.class,
                                     SimilarityReducer.class,
                                     SimilarityMatrixEntryKey.class,
                                     MatrixEntryWritable.class,
                                     SequenceFileOutputFormat.class);
      
            
            Configuration pairwiseConf = pairwiseSimilarity.getConfiguration();
            pairwiseConf.set(DISTRIBUTED_SIMILARITY_CLASSNAME, distributedSimilarityClassname);
            pairwiseConf.setInt(NUMBER_OF_COLUMNS, numberOfColumns);
            MultithreadedMapper.setMapperClass(pairwiseSimilarity, CooccurrencesMapper.class);
            MultithreadedMapper.setNumberOfThreads(pairwiseSimilarity, numMapThreads);
            SequenceFileOutputFormat.setCompressOutput(pairwiseSimilarity, true);
            SequenceFileOutputFormat.setOutputCompressorClass(pairwiseSimilarity, GzipCodec.class);
            SequenceFileOutputFormat.setOutputCompressionType(pairwiseSimilarity, CompressionType.BLOCK);
      
            pairwiseSimilarity.waitForCompletion(true);
          }
      

      Attachments

        1. patch_985097.txt
          6 kB
          Han Hui Wen
        2. after_patch_20100813.jpg
          108 kB
          Han Hui Wen

        Activity

          People

            srowen Sean R. Owen
            huiwenhan Han Hui Wen
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: