Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.4
    • Component/s: None
    • Labels:
      None

      Description

      I tried the Kmeans algorithm on the Synthetic Control data. The following error appears. I tried the Canopy algorithm, it is fine. This error is from Mapper. I am using Trunk.

      10/09/20 19:40:06 INFO mapred.JobClient: Task Id : attempt_201008261432_1324_m_000000_0, Status : FAILED
      java.lang.IllegalStateException: Cluster is empty!
      at org.apache.mahout.clustering.kmeans.KMeansClusterMapper.setup(KMeansClusterMapper.java:57)
      at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
      at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:583)
      at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
      at org.apache.hadoop.mapred.Child.main(Child.java:170)

        Activity

        Hide
        Paritosh Ranjan added a comment -

        The Examples Cluster Reuters is demonstrating the same problem now, due to which the build is failing.
        See https://builds.apache.org/job/Mahout-Examples-Cluster-Reuters/79/console. I am also attaching some part of the log.

        The build passed last time and there has been no code change in between.

        The fifth and sixth line of log shows that the path containing the clusters is being deleted.

        Can anyone think of the reasons behind this uneven failure?

        12/03/22 19:20:46 INFO common.AbstractJob: Command line arguments: {--clustering=null, --clusters=[/tmp/mahout-work-hudson/reuters-kmeans-clusters], --convergenceDelta=[0.5], --distanceMeasure=[org.apache.mahout.common.distance.CosineDistanceMeasure], --endPhase=[2147483647], --input=[/tmp/mahout-work-hudson/reuters-out-seqdir-sparse-kmeans/tfidf-vectors/], --maxIter=[10], --method=[mapreduce], --numClusters=[20], --output=[/tmp/mahout-work-hudson/reuters-kmeans], --overwrite=null, --startPhase=[0], --tempDir=[temp]}
        12/03/22 19:20:46 INFO common.HadoopUtil: Deleting /tmp/mahout-work-hudson/reuters-kmeans
        12/03/22 19:20:46 INFO common.HadoopUtil: Deleting /tmp/mahout-work-hudson/reuters-kmeans-clusters
        12/03/22 19:20:46 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
        12/03/22 19:20:46 INFO compress.CodecPool: Got brand-new compressor
        12/03/22 19:20:47 INFO kmeans.RandomSeedGenerator: Wrote 20 vectors to /tmp/mahout-work-hudson/reuters-kmeans-clusters/part-randomSeed
        12/03/22 19:20:47 INFO kmeans.KMeansDriver: Input: /tmp/mahout-work-hudson/reuters-out-seqdir-sparse-kmeans/tfidf-vectors Clusters In: /tmp/mahout-work-hudson/reuters-kmeans-clusters/part-randomSeed Out: /tmp/mahout-work-hudson/reuters-kmeans Distance: org.apache.mahout.common.distance.CosineDistanceMeasure
        12/03/22 19:20:47 INFO kmeans.KMeansDriver: convergence: 0.5 max Iterations: 10 num Reduce Tasks: org.apache.mahout.math.VectorWritable Input Vectors: {}
        12/03/22 19:20:47 INFO kmeans.KMeansDriver: K-Means Iteration 1
        12/03/22 19:20:47 INFO input.FileInputFormat: Total input paths to process : 1
        12/03/22 19:20:47 INFO mapred.JobClient: Running job: job_local_0001
        12/03/22 19:20:47 INFO mapred.MapTask: io.sort.mb = 100
        12/03/22 19:20:48 INFO mapred.MapTask: data buffer = 79691776/99614720
        12/03/22 19:20:48 INFO mapred.MapTask: record buffer = 262144/327680
        12/03/22 19:20:48 INFO compress.CodecPool: Got brand-new decompressor
        12/03/22 19:20:48 WARN mapred.LocalJobRunner: job_local_0001
        java.lang.IllegalStateException: No clusters found. Check your -c path.
        at org.apache.mahout.clustering.kmeans.KMeansMapper.setup(KMeansMapper.java:59)
        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
        at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
        12/03/22 19:20:48 INFO mapred.JobClient: map 0% reduce 0%
        12/03/22 19:20:48 INFO mapred.JobClient: Job complete: job_local_0001
        12/03/22 19:20:48 INFO mapred.JobClient: Counters: 0
        Exception in thread "main" java.lang.InterruptedException: K-Means Iteration failed processing /tmp/mahout-work-hudson/reuters-kmeans-clusters/part-randomSeed
        at org.apache.mahout.clustering.kmeans.KMeansDriver.runIteration(KMeansDriver.java:395)
        at org.apache.mahout.clustering.kmeans.KMeansDriver.buildClustersMR(KMeansDriver.java:339)
        at org.apache.mahout.clustering.kmeans.KMeansDriver.buildClusters(KMeansDriver.java:261)
        at org.apache.mahout.clustering.kmeans.KMeansDriver.run(KMeansDriver.java:169)
        at org.apache.mahout.clustering.kmeans.KMeansDriver.run(KMeansDriver.java:119)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
        at org.apache.mahout.clustering.kmeans.KMeansDriver.main(KMeansDriver.java:63)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
        at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
        at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:188)
        Build step 'Execute shell' marked build as failure

        Show
        Paritosh Ranjan added a comment - The Examples Cluster Reuters is demonstrating the same problem now, due to which the build is failing. See https://builds.apache.org/job/Mahout-Examples-Cluster-Reuters/79/console . I am also attaching some part of the log. The build passed last time and there has been no code change in between. The fifth and sixth line of log shows that the path containing the clusters is being deleted. Can anyone think of the reasons behind this uneven failure? 12/03/22 19:20:46 INFO common.AbstractJob: Command line arguments: {--clustering=null, --clusters= [/tmp/mahout-work-hudson/reuters-kmeans-clusters] , --convergenceDelta= [0.5] , --distanceMeasure= [org.apache.mahout.common.distance.CosineDistanceMeasure] , --endPhase= [2147483647] , --input= [/tmp/mahout-work-hudson/reuters-out-seqdir-sparse-kmeans/tfidf-vectors/] , --maxIter= [10] , --method= [mapreduce] , --numClusters= [20] , --output= [/tmp/mahout-work-hudson/reuters-kmeans] , --overwrite=null, --startPhase= [0] , --tempDir= [temp] } 12/03/22 19:20:46 INFO common.HadoopUtil: Deleting /tmp/mahout-work-hudson/reuters-kmeans 12/03/22 19:20:46 INFO common.HadoopUtil: Deleting /tmp/mahout-work-hudson/reuters-kmeans-clusters 12/03/22 19:20:46 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 12/03/22 19:20:46 INFO compress.CodecPool: Got brand-new compressor 12/03/22 19:20:47 INFO kmeans.RandomSeedGenerator: Wrote 20 vectors to /tmp/mahout-work-hudson/reuters-kmeans-clusters/part-randomSeed 12/03/22 19:20:47 INFO kmeans.KMeansDriver: Input: /tmp/mahout-work-hudson/reuters-out-seqdir-sparse-kmeans/tfidf-vectors Clusters In: /tmp/mahout-work-hudson/reuters-kmeans-clusters/part-randomSeed Out: /tmp/mahout-work-hudson/reuters-kmeans Distance: org.apache.mahout.common.distance.CosineDistanceMeasure 12/03/22 19:20:47 INFO kmeans.KMeansDriver: convergence: 0.5 max Iterations: 10 num Reduce Tasks: org.apache.mahout.math.VectorWritable Input Vectors: {} 12/03/22 19:20:47 INFO kmeans.KMeansDriver: K-Means Iteration 1 12/03/22 19:20:47 INFO input.FileInputFormat: Total input paths to process : 1 12/03/22 19:20:47 INFO mapred.JobClient: Running job: job_local_0001 12/03/22 19:20:47 INFO mapred.MapTask: io.sort.mb = 100 12/03/22 19:20:48 INFO mapred.MapTask: data buffer = 79691776/99614720 12/03/22 19:20:48 INFO mapred.MapTask: record buffer = 262144/327680 12/03/22 19:20:48 INFO compress.CodecPool: Got brand-new decompressor 12/03/22 19:20:48 WARN mapred.LocalJobRunner: job_local_0001 java.lang.IllegalStateException: No clusters found. Check your -c path. at org.apache.mahout.clustering.kmeans.KMeansMapper.setup(KMeansMapper.java:59) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212) 12/03/22 19:20:48 INFO mapred.JobClient: map 0% reduce 0% 12/03/22 19:20:48 INFO mapred.JobClient: Job complete: job_local_0001 12/03/22 19:20:48 INFO mapred.JobClient: Counters: 0 Exception in thread "main" java.lang.InterruptedException: K-Means Iteration failed processing /tmp/mahout-work-hudson/reuters-kmeans-clusters/part-randomSeed at org.apache.mahout.clustering.kmeans.KMeansDriver.runIteration(KMeansDriver.java:395) at org.apache.mahout.clustering.kmeans.KMeansDriver.buildClustersMR(KMeansDriver.java:339) at org.apache.mahout.clustering.kmeans.KMeansDriver.buildClusters(KMeansDriver.java:261) at org.apache.mahout.clustering.kmeans.KMeansDriver.run(KMeansDriver.java:169) at org.apache.mahout.clustering.kmeans.KMeansDriver.run(KMeansDriver.java:119) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.mahout.clustering.kmeans.KMeansDriver.main(KMeansDriver.java:63) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68) at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139) at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:188) Build step 'Execute shell' marked build as failure
        Hide
        qiang xu added a comment - - edited

        I think there is nothing wrong with the path.
        Because the /user/root/examples/bin/work/clusters is generated by kmeans example.
        All my steps are:
        ./bin/mahout org.apache.lucene.benchmark.utils.ExtractReuters ./examples/bin/work/reuters-sgm/ ./examples/bin/work/reuters-out/
        ./bin/mahout seqdirectory -i ./examples/bin/work/reuters-out/ -o ./examples/bin/work/reuters-out-seqdir -c UTF-8 -chunk 5 -ow
        ./bin/mahout seq2sparse -i ./examples/bin/work/reuters-out-seqdir/ -o ./examples/bin/work/reuters-out-seqdir-sparse
        ./bin/mahout kmeans -i ./examples/bin/work/reuters-out-seqdir-sparse/tfidf-vectors/ -c ./examples/bin/work/clusters -o ./examples/bin/work/reuters-kmeans -x 10 -k 20 -ow
        ./bin/mahout clusterdump -s examples/bin/work/reuters-kmeans/clusters-10 -d examples/bin/work/reuters-out-seqdir-sparse/dictionary.file-0 -dt sequencefile -b 100 -n 20

        I have also tested with aboosolute path of hdfs as following:
        [root@qxutest mahout-distribution-0.5]# hadoop fs -ls /user/root/examples/bin/work/
        Found 4 items
        drwxr-xr-x - root supergroup 0 2012-02-14 20:55 /user/root/examples/bin/work/clusters
        drwxr-xr-x - root supergroup 0 2012-02-14 20:56 /user/root/examples/bin/work/reuters-kmeans
        drwxr-xr-x - root supergroup 0 2012-02-14 20:29 /user/root/examples/bin/work/reuters-out-seqdir
        drwxr-xr-x - root supergroup 0 2012-02-14 20:32 /user/root/examples/bin/work/reuters-out-seqdir-sparse
        [root@qxutest mahout-distribution-0.5]# hadoop fs -ls /user/root/examples/bin/work/clusters
        Found 1 items
        rw-rr- 2 root supergroup 139 2012-02-14 20:55 /user/root/examples/bin/work/clusters/part-randomSeed
        [root@qxutest mahout-distribution-0.5]# ./bin/mahout kmeans -i /user/root/examples/bin/work/reuters-out-seqdir-sparse/tfidf-vectors/ -c /user/root/examples/bin/work/clusters -o /user/root/examples/bin/work/reuters-kmeans -x 10 -ow
        Running on hadoop, using HADOOP_HOME=/data/hadoop_cluster/hadoop-0.20.2/
        HADOOP_CONF_DIR=/data/hadoop_cluster/hadoop-0.20.2/conf/
        12/02/15 10:32:25 INFO common.AbstractJob: Command line arguments: {--clusters=/user/root/examples/bin/work/clusters, --convergenceDelta=0.5, --distanceMeasure=org.apache.mahout.common.distance.SquaredEuclideanDistanceMeasure, --endPhase=2147483647, --input=/user/root/examples/bin/work/reuters-out-seqdir-sparse/tfidf-vectors/, --maxIter=10, --method=mapreduce, --output=/user/root/examples/bin/work/reuters-kmeans, --overwrite=null, --startPhase=0, --tempDir=temp}
        12/02/15 10:32:25 INFO common.HadoopUtil: Deleting /user/root/examples/bin/work/reuters-kmeans
        12/02/15 10:32:25 INFO kmeans.KMeansDriver: Input: /user/root/examples/bin/work/reuters-out-seqdir-sparse/tfidf-vectors Clusters In: /user/root/examples/bin/work/clusters Out: /user/root/examples/bin/work/reuters-kmeans Distance: org.apache.mahout.common.distance.SquaredEuclideanDistanceMeasure
        12/02/15 10:32:25 INFO kmeans.KMeansDriver: convergence: 0.5 max Iterations: 10 num Reduce Tasks: org.apache.mahout.math.VectorWritable Input Vectors: {}
        12/02/15 10:32:25 INFO kmeans.KMeansDriver: K-Means Iteration 1
        12/02/15 10:32:26 INFO input.FileInputFormat: Total input paths to process : 1
        12/02/15 10:32:27 INFO mapred.JobClient: Running job: job_201202131515_0123
        12/02/15 10:32:28 INFO mapred.JobClient: map 0% reduce 0%
        12/02/15 10:32:38 INFO mapred.JobClient: Task Id : attempt_201202131515_0123_m_000000_0, Status : FAILED
        java.lang.IllegalStateException: No clusters found. Check your -c path.
        at org.apache.mahout.clustering.kmeans.KMeansMapper.setup(KMeansMapper.java:60)
        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
        at org.apache.hadoop.mapred.Child.main(Child.java:170)

        Also without ./
        ./bin/mahout kmeans -i examples/bin/work/reuters-out-seqdir-sparse/tfidf-vectors/ -c examples/bin/work/clusters -o examples/bin/work/reuters-kmeans -x 10 -ow
        Running on hadoop, using HADOOP_HOME=/data/hadoop_cluster/hadoop-0.20.2/
        HADOOP_CONF_DIR=/data/hadoop_cluster/hadoop-0.20.2/conf/
        12/02/15 10:38:36 INFO common.AbstractJob: Command line arguments: {--clusters=examples/bin/work/clusters, --convergenceDelta=0.5, --distanceMeasure=org.apache.mahout.common.distance.SquaredEuclideanDistanceMeasure, --endPhase=2147483647, --input=examples/bin/work/reuters-out-seqdir-sparse/tfidf-vectors/, --maxIter=10, --method=mapreduce, --output=examples/bin/work/reuters-kmeans, --overwrite=null, --startPhase=0, --tempDir=temp}
        12/02/15 10:38:37 INFO common.HadoopUtil: Deleting examples/bin/work/reuters-kmeans
        12/02/15 10:38:37 INFO kmeans.KMeansDriver: Input: examples/bin/work/reuters-out-seqdir-sparse/tfidf-vectors Clusters In: examples/bin/work/clusters Out: examples/bin/work/reuters-kmeans Distance: org.apache.mahout.common.distance.SquaredEuclideanDistanceMeasure
        12/02/15 10:38:37 INFO kmeans.KMeansDriver: convergence: 0.5 max Iterations: 10 num Reduce Tasks: org.apache.mahout.math.VectorWritable Input Vectors: {}
        12/02/15 10:38:37 INFO kmeans.KMeansDriver: K-Means Iteration 1
        12/02/15 10:38:37 INFO input.FileInputFormat: Total input paths to process : 1
        12/02/15 10:38:38 INFO mapred.JobClient: Running job: job_201202131515_0124
        12/02/15 10:38:39 INFO mapred.JobClient: map 0% reduce 0%
        12/02/15 10:38:50 INFO mapred.JobClient: Task Id : attempt_201202131515_0124_m_000000_0, Status : FAILED
        java.lang.IllegalStateException: No clusters found. Check your -c path.
        at org.apache.mahout.clustering.kmeans.KMeansMapper.setup(KMeansMapper.java:60)
        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
        at org.apache.hadoop.mapred.Child.main(Child.java:170)

        Show
        qiang xu added a comment - - edited I think there is nothing wrong with the path. Because the /user/root/examples/bin/work/clusters is generated by kmeans example. All my steps are: ./bin/mahout org.apache.lucene.benchmark.utils.ExtractReuters ./examples/bin/work/reuters-sgm/ ./examples/bin/work/reuters-out/ ./bin/mahout seqdirectory -i ./examples/bin/work/reuters-out/ -o ./examples/bin/work/reuters-out-seqdir -c UTF-8 -chunk 5 -ow ./bin/mahout seq2sparse -i ./examples/bin/work/reuters-out-seqdir/ -o ./examples/bin/work/reuters-out-seqdir-sparse ./bin/mahout kmeans -i ./examples/bin/work/reuters-out-seqdir-sparse/tfidf-vectors/ -c ./examples/bin/work/clusters -o ./examples/bin/work/reuters-kmeans -x 10 -k 20 -ow ./bin/mahout clusterdump -s examples/bin/work/reuters-kmeans/clusters-10 -d examples/bin/work/reuters-out-seqdir-sparse/dictionary.file-0 -dt sequencefile -b 100 -n 20 I have also tested with aboosolute path of hdfs as following: [root@qxutest mahout-distribution-0.5] # hadoop fs -ls /user/root/examples/bin/work/ Found 4 items drwxr-xr-x - root supergroup 0 2012-02-14 20:55 /user/root/examples/bin/work/clusters drwxr-xr-x - root supergroup 0 2012-02-14 20:56 /user/root/examples/bin/work/reuters-kmeans drwxr-xr-x - root supergroup 0 2012-02-14 20:29 /user/root/examples/bin/work/reuters-out-seqdir drwxr-xr-x - root supergroup 0 2012-02-14 20:32 /user/root/examples/bin/work/reuters-out-seqdir-sparse [root@qxutest mahout-distribution-0.5] # hadoop fs -ls /user/root/examples/bin/work/clusters Found 1 items rw-r r - 2 root supergroup 139 2012-02-14 20:55 /user/root/examples/bin/work/clusters/part-randomSeed [root@qxutest mahout-distribution-0.5] # ./bin/mahout kmeans -i /user/root/examples/bin/work/reuters-out-seqdir-sparse/tfidf-vectors/ -c /user/root/examples/bin/work/clusters -o /user/root/examples/bin/work/reuters-kmeans -x 10 -ow Running on hadoop, using HADOOP_HOME=/data/hadoop_cluster/hadoop-0.20.2/ HADOOP_CONF_DIR=/data/hadoop_cluster/hadoop-0.20.2/conf/ 12/02/15 10:32:25 INFO common.AbstractJob: Command line arguments: {--clusters=/user/root/examples/bin/work/clusters, --convergenceDelta=0.5, --distanceMeasure=org.apache.mahout.common.distance.SquaredEuclideanDistanceMeasure, --endPhase=2147483647, --input=/user/root/examples/bin/work/reuters-out-seqdir-sparse/tfidf-vectors/, --maxIter=10, --method=mapreduce, --output=/user/root/examples/bin/work/reuters-kmeans, --overwrite=null, --startPhase=0, --tempDir=temp} 12/02/15 10:32:25 INFO common.HadoopUtil: Deleting /user/root/examples/bin/work/reuters-kmeans 12/02/15 10:32:25 INFO kmeans.KMeansDriver: Input: /user/root/examples/bin/work/reuters-out-seqdir-sparse/tfidf-vectors Clusters In: /user/root/examples/bin/work/clusters Out: /user/root/examples/bin/work/reuters-kmeans Distance: org.apache.mahout.common.distance.SquaredEuclideanDistanceMeasure 12/02/15 10:32:25 INFO kmeans.KMeansDriver: convergence: 0.5 max Iterations: 10 num Reduce Tasks: org.apache.mahout.math.VectorWritable Input Vectors: {} 12/02/15 10:32:25 INFO kmeans.KMeansDriver: K-Means Iteration 1 12/02/15 10:32:26 INFO input.FileInputFormat: Total input paths to process : 1 12/02/15 10:32:27 INFO mapred.JobClient: Running job: job_201202131515_0123 12/02/15 10:32:28 INFO mapred.JobClient: map 0% reduce 0% 12/02/15 10:32:38 INFO mapred.JobClient: Task Id : attempt_201202131515_0123_m_000000_0, Status : FAILED java.lang.IllegalStateException: No clusters found. Check your -c path. at org.apache.mahout.clustering.kmeans.KMeansMapper.setup(KMeansMapper.java:60) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305) at org.apache.hadoop.mapred.Child.main(Child.java:170) Also without ./ ./bin/mahout kmeans -i examples/bin/work/reuters-out-seqdir-sparse/tfidf-vectors/ -c examples/bin/work/clusters -o examples/bin/work/reuters-kmeans -x 10 -ow Running on hadoop, using HADOOP_HOME=/data/hadoop_cluster/hadoop-0.20.2/ HADOOP_CONF_DIR=/data/hadoop_cluster/hadoop-0.20.2/conf/ 12/02/15 10:38:36 INFO common.AbstractJob: Command line arguments: {--clusters=examples/bin/work/clusters, --convergenceDelta=0.5, --distanceMeasure=org.apache.mahout.common.distance.SquaredEuclideanDistanceMeasure, --endPhase=2147483647, --input=examples/bin/work/reuters-out-seqdir-sparse/tfidf-vectors/, --maxIter=10, --method=mapreduce, --output=examples/bin/work/reuters-kmeans, --overwrite=null, --startPhase=0, --tempDir=temp} 12/02/15 10:38:37 INFO common.HadoopUtil: Deleting examples/bin/work/reuters-kmeans 12/02/15 10:38:37 INFO kmeans.KMeansDriver: Input: examples/bin/work/reuters-out-seqdir-sparse/tfidf-vectors Clusters In: examples/bin/work/clusters Out: examples/bin/work/reuters-kmeans Distance: org.apache.mahout.common.distance.SquaredEuclideanDistanceMeasure 12/02/15 10:38:37 INFO kmeans.KMeansDriver: convergence: 0.5 max Iterations: 10 num Reduce Tasks: org.apache.mahout.math.VectorWritable Input Vectors: {} 12/02/15 10:38:37 INFO kmeans.KMeansDriver: K-Means Iteration 1 12/02/15 10:38:37 INFO input.FileInputFormat: Total input paths to process : 1 12/02/15 10:38:38 INFO mapred.JobClient: Running job: job_201202131515_0124 12/02/15 10:38:39 INFO mapred.JobClient: map 0% reduce 0% 12/02/15 10:38:50 INFO mapred.JobClient: Task Id : attempt_201202131515_0124_m_000000_0, Status : FAILED java.lang.IllegalStateException: No clusters found. Check your -c path. at org.apache.mahout.clustering.kmeans.KMeansMapper.setup(KMeansMapper.java:60) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305) at org.apache.hadoop.mapred.Child.main(Child.java:170)
        Hide
        Sean Owen added a comment -

        Is this valid as a path to clusters? Shouldn't it be on HDFS? ./examples/bin/work/clusters
        Something is wrong with your input.

        Show
        Sean Owen added a comment - Is this valid as a path to clusters? Shouldn't it be on HDFS? ./examples/bin/work/clusters Something is wrong with your input.
        Hide
        qiang xu added a comment - - edited

        This problem still exist in mahout 0.5 and 0.6
        ./bin/mahout kmeans -i ./examples/bin/work/reuters-out-seqdir-sparse/tfidf-vectors/ -c ./examples/bin/work/clusters -o ./examples/bin/work/reuters-kmeans -x 10 -ow
        Running on hadoop, using HADOOP_HOME=/data/hadoop_cluster/hadoop-0.20.2/
        HADOOP_CONF_DIR=/data/hadoop_cluster/hadoop-0.20.2/conf/
        12/02/14 20:56:03 INFO common.AbstractJob: Command line arguments: {--clusters=./examples/bin/work/clusters, --convergenceDelta=0.5, --distanceMeasure=org.apache.mahout.common.distance.SquaredEuclideanDistanceMeasure, --endPhase=2147483647, --input=./examples/bin/work/reuters-out-seqdir-sparse/tfidf-vectors/, --maxIter=10, --method=mapreduce, --output=./examples/bin/work/reuters-kmeans, --overwrite=null, --startPhase=0, --tempDir=temp}
        12/02/14 20:56:03 INFO kmeans.KMeansDriver: Input: examples/bin/work/reuters-out-seqdir-sparse/tfidf-vectors Clusters In: examples/bin/work/clusters Out: examples/bin/work/reuters-kmeans Distance: org.apache.mahout.common.distance.SquaredEuclideanDistanceMeasure
        12/02/14 20:56:03 INFO kmeans.KMeansDriver: convergence: 0.5 max Iterations: 10 num Reduce Tasks: org.apache.mahout.math.VectorWritable Input Vectors: {}
        12/02/14 20:56:03 INFO kmeans.KMeansDriver: K-Means Iteration 1
        12/02/14 20:56:05 INFO input.FileInputFormat: Total input paths to process : 1
        12/02/14 20:56:06 INFO mapred.JobClient: Running job: job_201202131515_0122
        12/02/14 20:56:07 INFO mapred.JobClient: map 0% reduce 0%
        12/02/14 20:56:16 INFO mapred.JobClient: Task Id : attempt_201202131515_0122_m_000000_0, Status : FAILED
        java.lang.IllegalStateException: No clusters found. Check your -c path.
        at org.apache.mahout.clustering.kmeans.KMeansMapper.setup(KMeansMapper.java:60)
        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
        at org.apache.hadoop.mapred.Child.main(Child.java:170)
        It is really weired that cluster is gernerated
        [root@qxutest mahout-distribution-0.5]# hadoop fs -ls /user/root/examples/bin/work/
        Found 4 items
        drwxr-xr-x - root supergroup 0 2012-02-14 20:55 /user/root/examples/bin/work/clusters
        drwxr-xr-x - root supergroup 0 2012-02-14 20:56 /user/root/examples/bin/work/reuters-kmeans
        drwxr-xr-x - root supergroup 0 2012-02-14 20:29 /user/root/examples/bin/work/reuters-out-seqdir
        drwxr-xr-x - root supergroup 0 2012-02-14 20:32 /user/root/examples/bin/work/reuters-out-seqdir-sparse
        [root@qxutest mahout-distribution-0.5]# hadoop fs -ls /user/root/examples/bin/work/clusters
        Found 1 items
        rw-rr- 2 root supergroup 139 2012-02-14 20:55 /user/root/examples/bin/work/clusters/part-randomSeed

        I follow the guide in https://cwiki.apache.org/MAHOUT/k-means-clustering.html

        Show
        qiang xu added a comment - - edited This problem still exist in mahout 0.5 and 0.6 ./bin/mahout kmeans -i ./examples/bin/work/reuters-out-seqdir-sparse/tfidf-vectors/ -c ./examples/bin/work/clusters -o ./examples/bin/work/reuters-kmeans -x 10 -ow Running on hadoop, using HADOOP_HOME=/data/hadoop_cluster/hadoop-0.20.2/ HADOOP_CONF_DIR=/data/hadoop_cluster/hadoop-0.20.2/conf/ 12/02/14 20:56:03 INFO common.AbstractJob: Command line arguments: {--clusters=./examples/bin/work/clusters, --convergenceDelta=0.5, --distanceMeasure=org.apache.mahout.common.distance.SquaredEuclideanDistanceMeasure, --endPhase=2147483647, --input=./examples/bin/work/reuters-out-seqdir-sparse/tfidf-vectors/, --maxIter=10, --method=mapreduce, --output=./examples/bin/work/reuters-kmeans, --overwrite=null, --startPhase=0, --tempDir=temp} 12/02/14 20:56:03 INFO kmeans.KMeansDriver: Input: examples/bin/work/reuters-out-seqdir-sparse/tfidf-vectors Clusters In: examples/bin/work/clusters Out: examples/bin/work/reuters-kmeans Distance: org.apache.mahout.common.distance.SquaredEuclideanDistanceMeasure 12/02/14 20:56:03 INFO kmeans.KMeansDriver: convergence: 0.5 max Iterations: 10 num Reduce Tasks: org.apache.mahout.math.VectorWritable Input Vectors: {} 12/02/14 20:56:03 INFO kmeans.KMeansDriver: K-Means Iteration 1 12/02/14 20:56:05 INFO input.FileInputFormat: Total input paths to process : 1 12/02/14 20:56:06 INFO mapred.JobClient: Running job: job_201202131515_0122 12/02/14 20:56:07 INFO mapred.JobClient: map 0% reduce 0% 12/02/14 20:56:16 INFO mapred.JobClient: Task Id : attempt_201202131515_0122_m_000000_0, Status : FAILED java.lang.IllegalStateException: No clusters found. Check your -c path. at org.apache.mahout.clustering.kmeans.KMeansMapper.setup(KMeansMapper.java:60) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305) at org.apache.hadoop.mapred.Child.main(Child.java:170) It is really weired that cluster is gernerated [root@qxutest mahout-distribution-0.5] # hadoop fs -ls /user/root/examples/bin/work/ Found 4 items drwxr-xr-x - root supergroup 0 2012-02-14 20:55 /user/root/examples/bin/work/clusters drwxr-xr-x - root supergroup 0 2012-02-14 20:56 /user/root/examples/bin/work/reuters-kmeans drwxr-xr-x - root supergroup 0 2012-02-14 20:29 /user/root/examples/bin/work/reuters-out-seqdir drwxr-xr-x - root supergroup 0 2012-02-14 20:32 /user/root/examples/bin/work/reuters-out-seqdir-sparse [root@qxutest mahout-distribution-0.5] # hadoop fs -ls /user/root/examples/bin/work/clusters Found 1 items rw-r r - 2 root supergroup 139 2012-02-14 20:55 /user/root/examples/bin/work/clusters/part-randomSeed I follow the guide in https://cwiki.apache.org/MAHOUT/k-means-clustering.html
        Hide
        Hudson added a comment -

        Integrated in Mahout-Quality #382 (See https://hudson.apache.org/hudson/job/Mahout-Quality/382/)
        MAHOUT-504:

        • Added job completion tests to break out of iterations if errors occur
        • Fixed canopy cluster mapper initialization problem with _log files on Hadoop
        • All synthetic control examples run on Hadoop cluster
        • All unit tests run
        Show
        Hudson added a comment - Integrated in Mahout-Quality #382 (See https://hudson.apache.org/hudson/job/Mahout-Quality/382/ ) MAHOUT-504 : Added job completion tests to break out of iterations if errors occur Fixed canopy cluster mapper initialization problem with _log files on Hadoop All synthetic control examples run on Hadoop cluster All unit tests run
        Hide
        Hudson added a comment -

        Integrated in Mahout-Quality #375 (See https://hudson.apache.org/hudson/job/Mahout-Quality/375/)
        MAHOUT-504: reworded error message in cluster mapper for clarity

        Show
        Hudson added a comment - Integrated in Mahout-Quality #375 (See https://hudson.apache.org/hudson/job/Mahout-Quality/375/ ) MAHOUT-504 : reworded error message in cluster mapper for clarity
        Hide
        pragnesh added a comment - - edited

        i am also getting same exption with trunk code

        10/10/04 12:42:34 INFO mapred.JobClient: Running job: job_201010041038_0019
        10/10/04 12:42:35 INFO mapred.JobClient: map 0% reduce 0%
        10/10/04 12:42:45 INFO mapred.JobClient: Task Id : attempt_201010041038_0019_m_000000_0, Status : FAILED
        java.lang.IllegalStateException: No clusters found. Check your -c path.
        at org.apache.mahout.clustering.kmeans.KMeansMapper.setup(KMeansMapper.java:61)
        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
        at org.apache.hadoop.mapred.Child.main(Child.java:170)

        this run fine from eclipse

        but when i try to run from command line with hadoop. i see following output.

        while $MAHOUT_HOME/bin/mahout org.apache.mahout.clustering.syntheticcontrol.dirichlet.Job running fine without any error.

        pragnesh-laptop% $MAHOUT_HOME/bin/mahout org.apache.mahout.clustering.syntheticcontrol.kmeans.Job
        Running on hadoop, using HADOOP_HOME=/usr/lib/hadoop/
        HADOOP_CONF_DIR=/etc/hadoop/conf.pseudo
        10/10/05 12:26:05 WARN driver.MahoutDriver: No org.apache.mahout.clustering.syntheticcontrol.kmeans.Job.props found on classpath, will use command-line arguments only
        10/10/05 12:26:05 INFO kmeans.Job: Running with default arguments
        10/10/05 12:26:06 INFO kmeans.Job: Preparing Input
        10/10/05 12:26:06 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
        10/10/05 12:26:07 INFO input.FileInputFormat: Total input paths to process : 1
        10/10/05 12:26:09 INFO mapred.JobClient: Running job: job_201010051117_0005
        10/10/05 12:26:10 INFO mapred.JobClient: map 0% reduce 0%
        10/10/05 12:26:26 INFO mapred.JobClient: map 100% reduce 0%
        10/10/05 12:26:28 INFO mapred.JobClient: Job complete: job_201010051117_0005
        10/10/05 12:26:29 INFO mapred.JobClient: Counters: 7
        10/10/05 12:26:29 INFO mapred.JobClient: Job Counters
        10/10/05 12:26:29 INFO mapred.JobClient: Launched map tasks=1
        10/10/05 12:26:29 INFO mapred.JobClient: Data-local map tasks=1
        10/10/05 12:26:29 INFO mapred.JobClient: FileSystemCounters
        10/10/05 12:26:29 INFO mapred.JobClient: HDFS_BYTES_READ=288374
        10/10/05 12:26:29 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=335470
        10/10/05 12:26:29 INFO mapred.JobClient: Map-Reduce Framework
        10/10/05 12:26:29 INFO mapred.JobClient: Map input records=600
        10/10/05 12:26:29 INFO mapred.JobClient: Spilled Records=0
        10/10/05 12:26:29 INFO mapred.JobClient: Map output records=600
        10/10/05 12:26:29 INFO kmeans.Job: Running Canopy to get initial clusters
        10/10/05 12:26:29 INFO canopy.CanopyDriver: Build Clusters Input: output/data Out: output Measure: org.apache.mahout.common.distance.EuclideanDistanceMeasure@136a43c t1: 80.0 t2: 55.0
        10/10/05 12:26:29 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
        10/10/05 12:26:29 INFO input.FileInputFormat: Total input paths to process : 1
        10/10/05 12:26:30 INFO mapred.JobClient: Running job: job_201010051117_0006
        10/10/05 12:26:31 INFO mapred.JobClient: map 0% reduce 0%
        10/10/05 12:26:42 INFO mapred.JobClient: map 100% reduce 0%
        10/10/05 12:26:54 INFO mapred.JobClient: map 100% reduce 100%
        10/10/05 12:26:56 INFO mapred.JobClient: Job complete: job_201010051117_0006
        10/10/05 12:26:56 INFO mapred.JobClient: Counters: 17
        10/10/05 12:26:56 INFO mapred.JobClient: Job Counters
        10/10/05 12:26:56 INFO mapred.JobClient: Launched reduce tasks=1
        10/10/05 12:26:56 INFO mapred.JobClient: Launched map tasks=1
        10/10/05 12:26:56 INFO mapred.JobClient: Data-local map tasks=1
        10/10/05 12:26:56 INFO mapred.JobClient: FileSystemCounters
        10/10/05 12:26:56 INFO mapred.JobClient: FILE_BYTES_READ=13906
        10/10/05 12:26:56 INFO mapred.JobClient: HDFS_BYTES_READ=335470
        10/10/05 12:26:56 INFO mapred.JobClient: FILE_BYTES_WRITTEN=27844
        10/10/05 12:26:56 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=7131
        10/10/05 12:26:56 INFO mapred.JobClient: Map-Reduce Framework
        10/10/05 12:26:56 INFO mapred.JobClient: Reduce input groups=1
        10/10/05 12:26:56 INFO mapred.JobClient: Combine output records=0
        10/10/05 12:26:56 INFO mapred.JobClient: Map input records=600
        10/10/05 12:26:56 INFO mapred.JobClient: Reduce shuffle bytes=0
        10/10/05 12:26:56 INFO mapred.JobClient: Reduce output records=6
        10/10/05 12:26:56 INFO mapred.JobClient: Spilled Records=50
        10/10/05 12:26:56 INFO mapred.JobClient: Map output bytes=13800
        10/10/05 12:26:56 INFO mapred.JobClient: Combine input records=0
        10/10/05 12:26:56 INFO mapred.JobClient: Map output records=25
        10/10/05 12:26:56 INFO mapred.JobClient: Reduce input records=25
        10/10/05 12:26:56 INFO kmeans.Job: Running KMeans
        10/10/05 12:26:56 INFO kmeans.KMeansDriver: Input: output/data Clusters In: output/clusters-0 Out: output Distance: org.apache.mahout.common.distance.EuclideanDistanceMeasure
        10/10/05 12:26:56 INFO kmeans.KMeansDriver: convergence: 0.5 max Iterations: 10 num Reduce Tasks: org.apache.mahout.math.VectorWritable Input Vectors: {}
        10/10/05 12:26:56 INFO kmeans.KMeansDriver: K-Means Iteration 1
        10/10/05 12:26:56 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
        10/10/05 12:26:57 INFO input.FileInputFormat: Total input paths to process : 1
        10/10/05 12:26:58 INFO mapred.JobClient: Running job: job_201010051117_0007
        10/10/05 12:26:59 INFO mapred.JobClient: map 0% reduce 0%
        10/10/05 12:27:08 INFO mapred.JobClient: Task Id : attempt_201010051117_0007_m_000000_0, Status : FAILED
        java.lang.IllegalStateException: No clusters found. Check your -c path.
        at org.apache.mahout.clustering.kmeans.KMeansMapper.setup(KMeansMapper.java:61)
        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
        at org.apache.hadoop.mapred.Child.main(Child.java:170)

        10/10/05 12:27:14 INFO mapred.JobClient: Task Id : attempt_201010051117_0007_m_000000_1, Status : FAILED
        java.lang.IllegalStateException: No clusters found. Check your -c path.
        at org.apache.mahout.clustering.kmeans.KMeansMapper.setup(KMeansMapper.java:61)
        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
        at org.apache.hadoop.mapred.Child.main(Child.java:170)

        10/10/05 12:27:23 INFO mapred.JobClient: Task Id : attempt_201010051117_0007_m_000000_2, Status : FAILED
        java.lang.IllegalStateException: No clusters found. Check your -c path.
        at org.apache.mahout.clustering.kmeans.KMeansMapper.setup(KMeansMapper.java:61)
        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
        at org.apache.hadoop.mapred.Child.main(Child.java:170)

        10/10/05 12:27:35 INFO mapred.JobClient: Job complete: job_201010051117_0007
        10/10/05 12:27:35 INFO mapred.JobClient: Counters: 3
        10/10/05 12:27:35 INFO mapred.JobClient: Job Counters
        10/10/05 12:27:35 INFO mapred.JobClient: Launched map tasks=4
        10/10/05 12:27:35 INFO mapred.JobClient: Data-local map tasks=4
        10/10/05 12:27:35 INFO mapred.JobClient: Failed map tasks=1
        10/10/05 12:27:35 INFO kmeans.KMeansDriver: Clustering data
        10/10/05 12:27:35 INFO kmeans.KMeansDriver: Running Clustering
        10/10/05 12:27:35 INFO kmeans.KMeansDriver: Input: output/data Clusters In: output/clusters-1 Out: output/clusteredPoints Distance: org.apache.mahout.common.distance.EuclideanDistanceMeasure@136a43c
        10/10/05 12:27:35 INFO kmeans.KMeansDriver: convergence: 0.5 Input Vectors: org.apache.mahout.math.VectorWritable
        10/10/05 12:27:35 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
        10/10/05 12:27:36 INFO input.FileInputFormat: Total input paths to process : 1
        10/10/05 12:27:37 INFO mapred.JobClient: Running job: job_201010051117_0008
        10/10/05 12:27:38 INFO mapred.JobClient: map 0% reduce 0%
        10/10/05 12:27:47 INFO mapred.JobClient: Task Id : attempt_201010051117_0008_m_000000_0, Status : FAILED
        java.lang.IllegalStateException: Cluster is empty!
        at org.apache.mahout.clustering.kmeans.KMeansClusterMapper.setup(KMeansClusterMapper.java:57)
        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
        at org.apache.hadoop.mapred.Child.main(Child.java:170)

        10/10/05 12:27:53 INFO mapred.JobClient: Task Id : attempt_201010051117_0008_m_000000_1, Status : FAILED
        java.lang.IllegalStateException: Cluster is empty!
        at org.apache.mahout.clustering.kmeans.KMeansClusterMapper.setup(KMeansClusterMapper.java:57)
        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
        at org.apache.hadoop.mapred.Child.main(Child.java:170)

        10/10/05 12:27:59 INFO mapred.JobClient: Task Id : attempt_201010051117_0008_m_000000_2, Status : FAILED
        java.lang.IllegalStateException: Cluster is empty!
        at org.apache.mahout.clustering.kmeans.KMeansClusterMapper.setup(KMeansClusterMapper.java:57)
        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
        at org.apache.hadoop.mapred.Child.main(Child.java:170)

        10/10/05 12:28:11 INFO mapred.JobClient: Job complete: job_201010051117_0008
        10/10/05 12:28:11 INFO mapred.JobClient: Counters: 3
        10/10/05 12:28:11 INFO mapred.JobClient: Job Counters
        10/10/05 12:28:11 INFO mapred.JobClient: Launched map tasks=4
        10/10/05 12:28:11 INFO mapred.JobClient: Data-local map tasks=4
        10/10/05 12:28:11 INFO mapred.JobClient: Failed map tasks=1
        10/10/05 12:28:12 INFO driver.MahoutDriver: Program took 126495 ms

        Show
        pragnesh added a comment - - edited i am also getting same exption with trunk code 10/10/04 12:42:34 INFO mapred.JobClient: Running job: job_201010041038_0019 10/10/04 12:42:35 INFO mapred.JobClient: map 0% reduce 0% 10/10/04 12:42:45 INFO mapred.JobClient: Task Id : attempt_201010041038_0019_m_000000_0, Status : FAILED java.lang.IllegalStateException: No clusters found. Check your -c path. at org.apache.mahout.clustering.kmeans.KMeansMapper.setup(KMeansMapper.java:61) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305) at org.apache.hadoop.mapred.Child.main(Child.java:170) this run fine from eclipse but when i try to run from command line with hadoop. i see following output. while $MAHOUT_HOME/bin/mahout org.apache.mahout.clustering.syntheticcontrol.dirichlet.Job running fine without any error. pragnesh-laptop% $MAHOUT_HOME/bin/mahout org.apache.mahout.clustering.syntheticcontrol.kmeans.Job Running on hadoop, using HADOOP_HOME=/usr/lib/hadoop/ HADOOP_CONF_DIR=/etc/hadoop/conf.pseudo 10/10/05 12:26:05 WARN driver.MahoutDriver: No org.apache.mahout.clustering.syntheticcontrol.kmeans.Job.props found on classpath, will use command-line arguments only 10/10/05 12:26:05 INFO kmeans.Job: Running with default arguments 10/10/05 12:26:06 INFO kmeans.Job: Preparing Input 10/10/05 12:26:06 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same. 10/10/05 12:26:07 INFO input.FileInputFormat: Total input paths to process : 1 10/10/05 12:26:09 INFO mapred.JobClient: Running job: job_201010051117_0005 10/10/05 12:26:10 INFO mapred.JobClient: map 0% reduce 0% 10/10/05 12:26:26 INFO mapred.JobClient: map 100% reduce 0% 10/10/05 12:26:28 INFO mapred.JobClient: Job complete: job_201010051117_0005 10/10/05 12:26:29 INFO mapred.JobClient: Counters: 7 10/10/05 12:26:29 INFO mapred.JobClient: Job Counters 10/10/05 12:26:29 INFO mapred.JobClient: Launched map tasks=1 10/10/05 12:26:29 INFO mapred.JobClient: Data-local map tasks=1 10/10/05 12:26:29 INFO mapred.JobClient: FileSystemCounters 10/10/05 12:26:29 INFO mapred.JobClient: HDFS_BYTES_READ=288374 10/10/05 12:26:29 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=335470 10/10/05 12:26:29 INFO mapred.JobClient: Map-Reduce Framework 10/10/05 12:26:29 INFO mapred.JobClient: Map input records=600 10/10/05 12:26:29 INFO mapred.JobClient: Spilled Records=0 10/10/05 12:26:29 INFO mapred.JobClient: Map output records=600 10/10/05 12:26:29 INFO kmeans.Job: Running Canopy to get initial clusters 10/10/05 12:26:29 INFO canopy.CanopyDriver: Build Clusters Input: output/data Out: output Measure: org.apache.mahout.common.distance.EuclideanDistanceMeasure@136a43c t1: 80.0 t2: 55.0 10/10/05 12:26:29 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same. 10/10/05 12:26:29 INFO input.FileInputFormat: Total input paths to process : 1 10/10/05 12:26:30 INFO mapred.JobClient: Running job: job_201010051117_0006 10/10/05 12:26:31 INFO mapred.JobClient: map 0% reduce 0% 10/10/05 12:26:42 INFO mapred.JobClient: map 100% reduce 0% 10/10/05 12:26:54 INFO mapred.JobClient: map 100% reduce 100% 10/10/05 12:26:56 INFO mapred.JobClient: Job complete: job_201010051117_0006 10/10/05 12:26:56 INFO mapred.JobClient: Counters: 17 10/10/05 12:26:56 INFO mapred.JobClient: Job Counters 10/10/05 12:26:56 INFO mapred.JobClient: Launched reduce tasks=1 10/10/05 12:26:56 INFO mapred.JobClient: Launched map tasks=1 10/10/05 12:26:56 INFO mapred.JobClient: Data-local map tasks=1 10/10/05 12:26:56 INFO mapred.JobClient: FileSystemCounters 10/10/05 12:26:56 INFO mapred.JobClient: FILE_BYTES_READ=13906 10/10/05 12:26:56 INFO mapred.JobClient: HDFS_BYTES_READ=335470 10/10/05 12:26:56 INFO mapred.JobClient: FILE_BYTES_WRITTEN=27844 10/10/05 12:26:56 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=7131 10/10/05 12:26:56 INFO mapred.JobClient: Map-Reduce Framework 10/10/05 12:26:56 INFO mapred.JobClient: Reduce input groups=1 10/10/05 12:26:56 INFO mapred.JobClient: Combine output records=0 10/10/05 12:26:56 INFO mapred.JobClient: Map input records=600 10/10/05 12:26:56 INFO mapred.JobClient: Reduce shuffle bytes=0 10/10/05 12:26:56 INFO mapred.JobClient: Reduce output records=6 10/10/05 12:26:56 INFO mapred.JobClient: Spilled Records=50 10/10/05 12:26:56 INFO mapred.JobClient: Map output bytes=13800 10/10/05 12:26:56 INFO mapred.JobClient: Combine input records=0 10/10/05 12:26:56 INFO mapred.JobClient: Map output records=25 10/10/05 12:26:56 INFO mapred.JobClient: Reduce input records=25 10/10/05 12:26:56 INFO kmeans.Job: Running KMeans 10/10/05 12:26:56 INFO kmeans.KMeansDriver: Input: output/data Clusters In: output/clusters-0 Out: output Distance: org.apache.mahout.common.distance.EuclideanDistanceMeasure 10/10/05 12:26:56 INFO kmeans.KMeansDriver: convergence: 0.5 max Iterations: 10 num Reduce Tasks: org.apache.mahout.math.VectorWritable Input Vectors: {} 10/10/05 12:26:56 INFO kmeans.KMeansDriver: K-Means Iteration 1 10/10/05 12:26:56 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same. 10/10/05 12:26:57 INFO input.FileInputFormat: Total input paths to process : 1 10/10/05 12:26:58 INFO mapred.JobClient: Running job: job_201010051117_0007 10/10/05 12:26:59 INFO mapred.JobClient: map 0% reduce 0% 10/10/05 12:27:08 INFO mapred.JobClient: Task Id : attempt_201010051117_0007_m_000000_0, Status : FAILED java.lang.IllegalStateException: No clusters found. Check your -c path. at org.apache.mahout.clustering.kmeans.KMeansMapper.setup(KMeansMapper.java:61) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305) at org.apache.hadoop.mapred.Child.main(Child.java:170) 10/10/05 12:27:14 INFO mapred.JobClient: Task Id : attempt_201010051117_0007_m_000000_1, Status : FAILED java.lang.IllegalStateException: No clusters found. Check your -c path. at org.apache.mahout.clustering.kmeans.KMeansMapper.setup(KMeansMapper.java:61) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305) at org.apache.hadoop.mapred.Child.main(Child.java:170) 10/10/05 12:27:23 INFO mapred.JobClient: Task Id : attempt_201010051117_0007_m_000000_2, Status : FAILED java.lang.IllegalStateException: No clusters found. Check your -c path. at org.apache.mahout.clustering.kmeans.KMeansMapper.setup(KMeansMapper.java:61) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305) at org.apache.hadoop.mapred.Child.main(Child.java:170) 10/10/05 12:27:35 INFO mapred.JobClient: Job complete: job_201010051117_0007 10/10/05 12:27:35 INFO mapred.JobClient: Counters: 3 10/10/05 12:27:35 INFO mapred.JobClient: Job Counters 10/10/05 12:27:35 INFO mapred.JobClient: Launched map tasks=4 10/10/05 12:27:35 INFO mapred.JobClient: Data-local map tasks=4 10/10/05 12:27:35 INFO mapred.JobClient: Failed map tasks=1 10/10/05 12:27:35 INFO kmeans.KMeansDriver: Clustering data 10/10/05 12:27:35 INFO kmeans.KMeansDriver: Running Clustering 10/10/05 12:27:35 INFO kmeans.KMeansDriver: Input: output/data Clusters In: output/clusters-1 Out: output/clusteredPoints Distance: org.apache.mahout.common.distance.EuclideanDistanceMeasure@136a43c 10/10/05 12:27:35 INFO kmeans.KMeansDriver: convergence: 0.5 Input Vectors: org.apache.mahout.math.VectorWritable 10/10/05 12:27:35 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same. 10/10/05 12:27:36 INFO input.FileInputFormat: Total input paths to process : 1 10/10/05 12:27:37 INFO mapred.JobClient: Running job: job_201010051117_0008 10/10/05 12:27:38 INFO mapred.JobClient: map 0% reduce 0% 10/10/05 12:27:47 INFO mapred.JobClient: Task Id : attempt_201010051117_0008_m_000000_0, Status : FAILED java.lang.IllegalStateException: Cluster is empty! at org.apache.mahout.clustering.kmeans.KMeansClusterMapper.setup(KMeansClusterMapper.java:57) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305) at org.apache.hadoop.mapred.Child.main(Child.java:170) 10/10/05 12:27:53 INFO mapred.JobClient: Task Id : attempt_201010051117_0008_m_000000_1, Status : FAILED java.lang.IllegalStateException: Cluster is empty! at org.apache.mahout.clustering.kmeans.KMeansClusterMapper.setup(KMeansClusterMapper.java:57) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305) at org.apache.hadoop.mapred.Child.main(Child.java:170) 10/10/05 12:27:59 INFO mapred.JobClient: Task Id : attempt_201010051117_0008_m_000000_2, Status : FAILED java.lang.IllegalStateException: Cluster is empty! at org.apache.mahout.clustering.kmeans.KMeansClusterMapper.setup(KMeansClusterMapper.java:57) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305) at org.apache.hadoop.mapred.Child.main(Child.java:170) 10/10/05 12:28:11 INFO mapred.JobClient: Job complete: job_201010051117_0008 10/10/05 12:28:11 INFO mapred.JobClient: Counters: 3 10/10/05 12:28:11 INFO mapred.JobClient: Job Counters 10/10/05 12:28:11 INFO mapred.JobClient: Launched map tasks=4 10/10/05 12:28:11 INFO mapred.JobClient: Data-local map tasks=4 10/10/05 12:28:11 INFO mapred.JobClient: Failed map tasks=1 10/10/05 12:28:12 INFO driver.MahoutDriver: Program took 126495 ms
        Hide
        Zhen Guo added a comment -

        Jeff, did you run the following command recently?

        $MAHOUT_HOME/bin/mahout org.apache.mahout.clustering.syntheticcontrol.kmeans.Job

        I use the Trunk code on Sept. 27. It does not work for me. The following error message:

        10/09/30 20:58:07 INFO mapred.JobClient: Task Id : attempt_201008261432_2003_m_000000_0, Status : FAILED
        java.lang.IllegalStateException: No clusters found. Check your -c path.
        at org.apache.mahout.clustering.kmeans.KMeansMapper.setup(KMeansMapper.java:61)
        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:583)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
        at org.apache.hadoop.mapred.Child.main(Child.java:170)

        Show
        Zhen Guo added a comment - Jeff, did you run the following command recently? $MAHOUT_HOME/bin/mahout org.apache.mahout.clustering.syntheticcontrol.kmeans.Job I use the Trunk code on Sept. 27. It does not work for me. The following error message: 10/09/30 20:58:07 INFO mapred.JobClient: Task Id : attempt_201008261432_2003_m_000000_0, Status : FAILED java.lang.IllegalStateException: No clusters found. Check your -c path. at org.apache.mahout.clustering.kmeans.KMeansMapper.setup(KMeansMapper.java:61) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:583) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305) at org.apache.hadoop.mapred.Child.main(Child.java:170)
        Hide
        Jeff Eastman added a comment -

        Trunk is, afaict, working for all synthetic control jobs; both with the default arguments and with user-supplied arguments. There was a problem in 0.3 and some of these issues relate to that edition. This issue should be closed. Does anybody disagree? Zhen?

        Show
        Jeff Eastman added a comment - Trunk is, afaict, working for all synthetic control jobs; both with the default arguments and with user-supplied arguments. There was a problem in 0.3 and some of these issues relate to that edition. This issue should be closed. Does anybody disagree? Zhen?
        Hide
        Zhen Guo added a comment -

        Is this change available in Trunk?

        I tested as in Quick Start document. I use the following command:

        $MAHOUT_HOME/bin/mahout org.apache.mahout.clustering.syntheticcontrol.kmeans.Job

        It failed and the error messages are the same as above.

        Show
        Zhen Guo added a comment - Is this change available in Trunk? I tested as in Quick Start document. I use the following command: $MAHOUT_HOME/bin/mahout org.apache.mahout.clustering.syntheticcontrol.kmeans.Job It failed and the error messages are the same as above.
        Hide
        Hudson added a comment -

        Integrated in Mahout-Quality #339 (See https://hudson.apache.org/hudson/job/Mahout-Quality/339/)
        MAHOUT-504: improved error message in Fuzzy k-Means

        Show
        Hudson added a comment - Integrated in Mahout-Quality #339 (See https://hudson.apache.org/hudson/job/Mahout-Quality/339/ ) MAHOUT-504 : improved error message in Fuzzy k-Means
        Hide
        Zhen Guo added a comment -

        Still failed for different reason.

        10/09/25 01:29:11 INFO mapred.JobClient: Task Id : attempt_201008261432_1574_m_000000_0, Status : FAILED
        java.lang.IllegalStateException: No clusters found. Check your -c path.
        at org.apache.mahout.clustering.kmeans.KMeansMapper.setup(KMeansMapper.java:61)
        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:583)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
        at org.apache.hadoop.mapred.Child.main(Child.java:170)

        Show
        Zhen Guo added a comment - Still failed for different reason. 10/09/25 01:29:11 INFO mapred.JobClient: Task Id : attempt_201008261432_1574_m_000000_0, Status : FAILED java.lang.IllegalStateException: No clusters found. Check your -c path. at org.apache.mahout.clustering.kmeans.KMeansMapper.setup(KMeansMapper.java:61) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:583) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305) at org.apache.hadoop.mapred.Child.main(Child.java:170)
        Hide
        Hudson added a comment -

        Integrated in Mahout-Quality #322 (See https://hudson.apache.org/hudson/job/Mahout-Quality/322/)
        MAHOUT-504. Fixed CLI arguments and did other refactoring of synthetic control
        example. Tested CLI invocation with explicit arguments which was the source of
        the problems cited in this issue. All tests run

        Show
        Hudson added a comment - Integrated in Mahout-Quality #322 (See https://hudson.apache.org/hudson/job/Mahout-Quality/322/ ) MAHOUT-504 . Fixed CLI arguments and did other refactoring of synthetic control example. Tested CLI invocation with explicit arguments which was the source of the problems cited in this issue. All tests run

          People

          • Assignee:
            Robin Anil
            Reporter:
            Zhen Guo
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development