Uploaded image for project: 'Mahout'
  1. Mahout
  2. MAHOUT-1452

Kmeans unexpected behaviour after removal of file scheme in output path for method mapreduce

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Minor
    • Resolution: Not A Problem
    • Affects Version/s: 0.9
    • Fix Version/s: 0.10.0
    • Component/s: Clustering
    • Labels:
    • Environment:

      CentOS, CDH4.6(3 Node Cluster)

      Description

      Remove the hdfs scheme from output path, it will create clusters-0 in local file system and clusters-1 in HDFS and after that it spits an error as it expects clusters-0 to be in HDFS. Please check below stacktrace

      2014-03-11 14:52:15 o.a.m.c.AbstractJob [INFO] Command line arguments: {--clustering=null, --clusters=[/3/clusters-0-final], --convergenceDelta=[0.1], --distanceMeasure=[org.apache.mahout.common.distance.EuclideanDistanceMeasure], --endPhase=[2147483647], --input=[/2/sequence], --maxIter=[100], --method=[mapreduce], --output=[/5], --overwrite=null, --startPhase=[0], --tempDir=[temp]}
      2014-03-11 14:52:15 o.a.h.u.NativeCodeLoader [WARN] Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
      2014-03-11 14:52:15 o.a.m.c.k.KMeansDriver [INFO] Input: /2/sequence Clusters In: /3/clusters-0-final Out: /5
      2014-03-11 14:52:15 o.a.m.c.k.KMeansDriver [INFO] convergence: 0.1 max Iterations: 100
      2014-03-11 14:52:16 o.a.h.m.JobClient [WARN] Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
      2014-03-11 14:52:17 o.a.h.m.l.i.FileInputFormat [INFO] Total input paths to process : 3
      2014-03-11 14:52:19 o.a.h.m.JobClient [INFO] Running job: job_201403111332_0011
      2014-03-11 14:52:20 o.a.h.m.JobClient [INFO] map 0% reduce 0%
      2014-03-11 14:52:28 o.a.h.m.JobClient [INFO] Task Id : attempt_201403111332_0011_m_000000_0, Status : FAILED
      2014-03-11 14:52:28 STDIO [ERROR] java.lang.IllegalStateException: /5/clusters-0
      at org.apache.mahout.common.iterator.sequencefile.SequenceFileDirValueIterable.iterator(SequenceFileDirValueIterable.java:78)
      at org.apache.mahout.clustering.classify.ClusterClassifier.readFromSeqFiles(ClusterClassifier.java:208)
      at org.apache.mahout.clustering.iterator.CIMapper.setup(CIMapper.java:44)
      at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:138)
      at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:672)
      at org.apache.hadoop.mapred.MapTask.run(MapTask.java:330)
      at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
      at java.security.AccessController.doPrivileged(Native Method)
      at javax.security.auth.Subject.doAs(Subject.java:415)
      at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1438)
      at org.apache.hadoop.mapred.Child.main(Child.java:262)
      Caused by: java.io.FileNotFoundException: File /5/clusters-0

      If you provide HDFS uri in output then it works like a charm.

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              biks Bikash Gupta
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Time Tracking

                Estimated:
                Original Estimate - 72h
                72h
                Remaining:
                Remaining Estimate - 72h
                72h
                Logged:
                Time Spent - Not Specified
                Not Specified