Uploaded image for project: 'Mahout'
  1. Mahout
  2. MAHOUT-577

RowSimilarityJob hangs during CooccurrencesMapper

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Not A Problem
    • 0.4
    • 0.5
    • None
    • None
    • Linux Debian 5.0.5, 12GB Ram, Hadoop 20.3 installation

    Description

      Hello,

      When trying to run a RowSimilarityJob on a matrix ( 146682 x 138351 ), the job gets through the RowWeightMapper and WeightedOccurrencesPerColumnReducer, and hangs during the CooccurrencesMapper although it shows that the map tasks are 100% complete.

      The command I use to run the job is:

      hadoop jar mahout-core-0.4-job.jar org.apache.mahout.math.hadoop.similarity.RowSimilarityJob -Dmapred.input.dir=/user/maya.hristakeva/mahout/core4/tf/1/0.001/title/12_07_10/lda/5/lda-sim/ldaCompressedDocumentsMatrix -Dmapred.output.dir=/user/maya.hristakeva/mahout/core4/tf/1/0.001/title/12_07_10/lda/5/lda-sim/ldaDocumentSimilarityMatrix -Dmapred.reduce.tasks=8 -Dmapred.map.tasks=200 -Dmapred.job.name=LDA_ROW_SIMILARITY_TEST --tempDir /user/maya.hristakeva/temp/lda/5 --numberOfColumns 138351 --similarityClassname org.apache.mahout.math.hadoop.similarity.vector.DistributedEuclideanDistanceVectorSimilarity --maxSimilaritiesPerRow 10

      And the output of the mappers which are 100% complete, but hanging is:

      syslog logs

      01-05 18:30:00,835 INFO org.apache.hadoop.mapred.MapTask: bufstart = 29085149; bufend = 39038598; bufvoid = 99614720
      2011-01-05 18:30:00,835 INFO org.apache.hadoop.mapred.MapTask: kvstart = 65461; kvend = 327605; length = 327680
      2011-01-05 18:30:06,241 INFO org.apache.hadoop.mapred.MapTask: Finished spill 94
      2011-01-05 18:30:09,208 INFO org.apache.hadoop.mapred.MapTask: Spilling map output: record full = true
      2011-01-05 18:30:09,208 INFO org.apache.hadoop.mapred.MapTask: bufstart = 39038598; bufend = 48983989; bufvoid = 99614720
      2011-01-05 18:30:09,208 INFO org.apache.hadoop.mapred.MapTask: kvstart = 327605; kvend = 262068; length = 327680
      2011-01-05 18:30:14,528 INFO org.apache.hadoop.mapred.MapTask: Finished spill 95
      2011-01-05 18:30:17,328 INFO org.apache.hadoop.mapred.MapTask: Spilling map output: record full = true
      2011-01-05 18:30:17,328 INFO org.apache.hadoop.mapred.MapTask: bufstart = 48983989; bufend = 58929384; bufvoid = 99614720
      2011-01-05 18:30:17,328 INFO org.apache.hadoop.mapred.MapTask: kvstart = 262068; kvend = 196531; length = 327680
      2011-01-05 18:30:22,615 INFO org.apache.hadoop.mapred.MapTask: Finished spill 96
      .
      .
      .

      This problem does not occur when I use a toy matrix of 100 x 100, but once I give it the original matrix of ..... the problem is always reproducible.

      Any ideas on what could be causing this?

      Thanks,
      Maya Hristakeva

      Attachments

        Activity

          People

            Unassigned Unassigned
            mhristakeva Maya Hristakeva
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: