Details
-
Improvement
-
Status: Closed
-
Major
-
Resolution: Not A Problem
-
0.4
-
None
-
None
Description
Because CooccurrencesMapper has huge computing,
Maybe we can replace Mapper with MultithreadedMapper.
And call the mapper
original:
if (shouldRunNextPhase(parsedArgs, currentPhase)) { Job pairwiseSimilarity = prepareJob(weightsPath, pairwiseSimilarityPath, SequenceFileInputFormat.class, CooccurrencesMapper.class, WeightedRowPair.class, Cooccurrence.class, SimilarityReducer.class, SimilarityMatrixEntryKey.class, MatrixEntryWritable.class, SequenceFileOutputFormat.class); Configuration pairwiseConf = pairwiseSimilarity.getConfiguration(); pairwiseConf.set(DISTRIBUTED_SIMILARITY_CLASSNAME, distributedSimilarityClassname); pairwiseConf.setInt(NUMBER_OF_COLUMNS, numberOfColumns); pairwiseSimilarity.waitForCompletion(true); }
new:
if (shouldRunNextPhase(parsedArgs, currentPhase)) { Job pairwiseSimilarity = prepareJob(weightsPath, pairwiseSimilarityPath, SequenceFileInputFormat.class, CooccurrencesMapper.class, WeightedRowPair.class, Cooccurrence.class, SimilarityReducer.class, SimilarityMatrixEntryKey.class, MatrixEntryWritable.class, SequenceFileOutputFormat.class); Configuration pairwiseConf = pairwiseSimilarity.getConfiguration(); pairwiseConf.set(DISTRIBUTED_SIMILARITY_CLASSNAME, distributedSimilarityClassname); pairwiseConf.setInt(NUMBER_OF_COLUMNS, numberOfColumns); MultithreadedMapper.setMapperClass(pairwiseSimilarity, CooccurrencesMapper.class); MultithreadedMapper.setNumberOfThreads(pairwiseSimilarity, numMapThreads); SequenceFileOutputFormat.setCompressOutput(pairwiseSimilarity, true); SequenceFileOutputFormat.setOutputCompressorClass(pairwiseSimilarity, GzipCodec.class); SequenceFileOutputFormat.setOutputCompressionType(pairwiseSimilarity, CompressionType.BLOCK); pairwiseSimilarity.waitForCompletion(true); }