Uploaded image for project: 'Mahout'
  1. Mahout
  2. MAHOUT-1469

Streaming KMeans fails when executed in MapReduce mode and REDUCE_STREAMING_KMEANS is set to true

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Won't Fix
    • 0.9
    • 1.0.0
    • classic

    Description

      Centroids are not being generated when executed in MR mode with -rskm flag set.

      14/03/20 02:42:12 INFO mapreduce.StreamingKMeansThread: Estimated Points: 282
      14/03/20 02:42:12 INFO mapred.JobClient:  map 100% reduce 0%
      14/03/20 02:42:14 INFO mapreduce.StreamingKMeansReducer: Number of Centroids: 0
      14/03/20 02:42:14 WARN mapred.LocalJobRunner: job_local1374896815_0001
      java.lang.IllegalArgumentException: Must have nonzero number of training and test vectors. Asked for %.1f %% of %d vectors for test [10.000000149011612, 0]
      	at com.google.common.base.Preconditions.checkArgument(Preconditions.java:148)
      	at org.apache.mahout.clustering.streaming.cluster.BallKMeans.splitTrainTest(BallKMeans.java:176)
      	at org.apache.mahout.clustering.streaming.cluster.BallKMeans.cluster(BallKMeans.java:192)
      	at org.apache.mahout.clustering.streaming.mapreduce.StreamingKMeansReducer.getBestCentroids(StreamingKMeansReducer.java:107)
      	at org.apache.mahout.clustering.streaming.mapreduce.StreamingKMeansReducer.reduce(StreamingKMeansReducer.java:73)
      	at org.apache.mahout.clustering.streaming.mapreduce.StreamingKMeansReducer.reduce(StreamingKMeansReducer.java:37)
      	at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:177)
      	at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:649)
      	at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:418)
      	at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:398)
      14/03/20 02:42:14 INFO mapred.JobClient: Job complete: job_local1374896815_0001
      14/03/20 02:42:14 INFO mapred.JobClient: Counters: 16
      14/03/20 02:42:14 INFO mapred.JobClient:   File Input Format Counters 
      14/03/20 02:42:14 INFO mapred.JobClient:     Bytes Read=17156391
      14/03/20 02:42:14 INFO mapred.JobClient:   FileSystemCounters
      14/03/20 02:42:14 INFO mapred.JobClient:     FILE_BYTES_READ=41925624
      14/03/20 02:42:14 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=25974741
      14/03/20 02:42:14 INFO mapred.JobClient:   Map-Reduce Framework
      14/03/20 02:42:14 INFO mapred.JobClient:     Map output materialized bytes=956293
      14/03/20 02:42:14 INFO mapred.JobClient:     Map input records=21578
      14/03/20 02:42:14 INFO mapred.JobClient:     Reduce shuffle bytes=0
      14/03/20 02:42:14 INFO mapred.JobClient:     Spilled Records=282
      14/03/20 02:42:14 INFO mapred.JobClient:     Map output bytes=1788012
      14/03/20 02:42:14 INFO mapred.JobClient:     Total committed heap usage (bytes)=217214976
      14/03/20 02:42:14 INFO mapred.JobClient:     Combine input records=0
      14/03/20 02:42:14 INFO mapred.JobClient:     SPLIT_RAW_BYTES=163
      14/03/20 02:42:14 INFO mapred.JobClient:     Reduce input records=0
      14/03/20 02:42:14 INFO mapred.JobClient:     Reduce input groups=0
      14/03/20 02:42:14 INFO mapred.JobClient:     Combine output records=0
      14/03/20 02:42:14 INFO mapred.JobClient:     Reduce output records=0
      14/03/20 02:42:14 INFO mapred.JobClient:     Map output records=282
      14/03/20 02:42:14 INFO driver.MahoutDriver: Program took 506269 ms (Minutes: 8.437816666666667)
      

      Attachments

        Issue Links

          Activity

            People

              smarthi Suneel Marthi
              smarthi Suneel Marthi
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: