Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Not A Problem
-
0.4
-
None
-
None
-
Linux Debian 5.0.5, 12GB Ram, Hadoop 20.3 installation
Description
Hello,
When trying to run a RowSimilarityJob on a matrix ( 146682 x 138351 ), the job gets through the RowWeightMapper and WeightedOccurrencesPerColumnReducer, and hangs during the CooccurrencesMapper although it shows that the map tasks are 100% complete.
The command I use to run the job is:
hadoop jar mahout-core-0.4-job.jar org.apache.mahout.math.hadoop.similarity.RowSimilarityJob -Dmapred.input.dir=/user/maya.hristakeva/mahout/core4/tf/1/0.001/title/12_07_10/lda/5/lda-sim/ldaCompressedDocumentsMatrix -Dmapred.output.dir=/user/maya.hristakeva/mahout/core4/tf/1/0.001/title/12_07_10/lda/5/lda-sim/ldaDocumentSimilarityMatrix -Dmapred.reduce.tasks=8 -Dmapred.map.tasks=200 -Dmapred.job.name=LDA_ROW_SIMILARITY_TEST --tempDir /user/maya.hristakeva/temp/lda/5 --numberOfColumns 138351 --similarityClassname org.apache.mahout.math.hadoop.similarity.vector.DistributedEuclideanDistanceVectorSimilarity --maxSimilaritiesPerRow 10
And the output of the mappers which are 100% complete, but hanging is:
syslog logs
01-05 18:30:00,835 INFO org.apache.hadoop.mapred.MapTask: bufstart = 29085149; bufend = 39038598; bufvoid = 99614720
2011-01-05 18:30:00,835 INFO org.apache.hadoop.mapred.MapTask: kvstart = 65461; kvend = 327605; length = 327680
2011-01-05 18:30:06,241 INFO org.apache.hadoop.mapred.MapTask: Finished spill 94
2011-01-05 18:30:09,208 INFO org.apache.hadoop.mapred.MapTask: Spilling map output: record full = true
2011-01-05 18:30:09,208 INFO org.apache.hadoop.mapred.MapTask: bufstart = 39038598; bufend = 48983989; bufvoid = 99614720
2011-01-05 18:30:09,208 INFO org.apache.hadoop.mapred.MapTask: kvstart = 327605; kvend = 262068; length = 327680
2011-01-05 18:30:14,528 INFO org.apache.hadoop.mapred.MapTask: Finished spill 95
2011-01-05 18:30:17,328 INFO org.apache.hadoop.mapred.MapTask: Spilling map output: record full = true
2011-01-05 18:30:17,328 INFO org.apache.hadoop.mapred.MapTask: bufstart = 48983989; bufend = 58929384; bufvoid = 99614720
2011-01-05 18:30:17,328 INFO org.apache.hadoop.mapred.MapTask: kvstart = 262068; kvend = 196531; length = 327680
2011-01-05 18:30:22,615 INFO org.apache.hadoop.mapred.MapTask: Finished spill 96
.
.
.
This problem does not occur when I use a toy matrix of 100 x 100, but once I give it the original matrix of ..... the problem is always reproducible.
Any ideas on what could be causing this?
Thanks,
Maya Hristakeva