Description
Centroids are not being generated when executed in MR mode with -rskm flag set.
14/03/20 02:42:12 INFO mapreduce.StreamingKMeansThread: Estimated Points: 282 14/03/20 02:42:12 INFO mapred.JobClient: map 100% reduce 0% 14/03/20 02:42:14 INFO mapreduce.StreamingKMeansReducer: Number of Centroids: 0 14/03/20 02:42:14 WARN mapred.LocalJobRunner: job_local1374896815_0001 java.lang.IllegalArgumentException: Must have nonzero number of training and test vectors. Asked for %.1f %% of %d vectors for test [10.000000149011612, 0] at com.google.common.base.Preconditions.checkArgument(Preconditions.java:148) at org.apache.mahout.clustering.streaming.cluster.BallKMeans.splitTrainTest(BallKMeans.java:176) at org.apache.mahout.clustering.streaming.cluster.BallKMeans.cluster(BallKMeans.java:192) at org.apache.mahout.clustering.streaming.mapreduce.StreamingKMeansReducer.getBestCentroids(StreamingKMeansReducer.java:107) at org.apache.mahout.clustering.streaming.mapreduce.StreamingKMeansReducer.reduce(StreamingKMeansReducer.java:73) at org.apache.mahout.clustering.streaming.mapreduce.StreamingKMeansReducer.reduce(StreamingKMeansReducer.java:37) at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:177) at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:649) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:418) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:398) 14/03/20 02:42:14 INFO mapred.JobClient: Job complete: job_local1374896815_0001 14/03/20 02:42:14 INFO mapred.JobClient: Counters: 16 14/03/20 02:42:14 INFO mapred.JobClient: File Input Format Counters 14/03/20 02:42:14 INFO mapred.JobClient: Bytes Read=17156391 14/03/20 02:42:14 INFO mapred.JobClient: FileSystemCounters 14/03/20 02:42:14 INFO mapred.JobClient: FILE_BYTES_READ=41925624 14/03/20 02:42:14 INFO mapred.JobClient: FILE_BYTES_WRITTEN=25974741 14/03/20 02:42:14 INFO mapred.JobClient: Map-Reduce Framework 14/03/20 02:42:14 INFO mapred.JobClient: Map output materialized bytes=956293 14/03/20 02:42:14 INFO mapred.JobClient: Map input records=21578 14/03/20 02:42:14 INFO mapred.JobClient: Reduce shuffle bytes=0 14/03/20 02:42:14 INFO mapred.JobClient: Spilled Records=282 14/03/20 02:42:14 INFO mapred.JobClient: Map output bytes=1788012 14/03/20 02:42:14 INFO mapred.JobClient: Total committed heap usage (bytes)=217214976 14/03/20 02:42:14 INFO mapred.JobClient: Combine input records=0 14/03/20 02:42:14 INFO mapred.JobClient: SPLIT_RAW_BYTES=163 14/03/20 02:42:14 INFO mapred.JobClient: Reduce input records=0 14/03/20 02:42:14 INFO mapred.JobClient: Reduce input groups=0 14/03/20 02:42:14 INFO mapred.JobClient: Combine output records=0 14/03/20 02:42:14 INFO mapred.JobClient: Reduce output records=0 14/03/20 02:42:14 INFO mapred.JobClient: Map output records=282 14/03/20 02:42:14 INFO driver.MahoutDriver: Program took 506269 ms (Minutes: 8.437816666666667)
Attachments
Issue Links
- supercedes
-
MAHOUT-1486 Streaming KMeans NPE
- Closed