Uploaded image for project: 'Mahout'
  1. Mahout
  2. MAHOUT-1658

Kmeans fails when running on HDFS

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Cannot Reproduce
    • Affects Version/s: 0.9
    • Fix Version/s: 0.11.0
    • Component/s: Clustering
    • Labels:
    • Environment:

      CentOS 6.6 with HDP 2.2

      Description

      Hi,
      I was trying to run some examples of mahout on a hadoop platform and saw that when kmeans running in local host, it returned successfully. However, when it ran with HDFS, mahout looked for the intermediate results on localhost instead on HDFS if we use relative path.
      I have to use absolute path of the input and output if I want kmeans to run correctly.

      Here is an typical error when running on HDFS:

      15/03/26 12:15:07 INFO mapreduce.Job: Task Id : attempt_1426848955524_0062_m_000000_2, Status : FAILED
      Error: java.lang.IllegalStateException: output/clusters-0
      at org.apache.mahout.common.iterator.sequencefile.SequenceFileDirValueIterable.iterator(SequenceFileDirValueIterable.java:78)
      at org.apache.mahout.clustering.classify.ClusterClassifier.readFromSeqFiles(ClusterClassifier.java:208)
      at org.apache.mahout.clustering.iterator.CIMapper.setup(CIMapper.java:44)
      at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
      at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:784)
      at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
      at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
      at java.security.AccessController.doPrivileged(Native Method)
      at javax.security.auth.Subject.doAs(Subject.java:415)
      at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
      at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
      Caused by: java.io.FileNotFoundException: File output/clusters-0 does not exist
      at org.apache.hadoop.fs.RawLocalFileSystem.listStatus(RawLocalFileSystem.java:376)
      at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1485)
      at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1525)
      at org.apache.hadoop.fs.ChecksumFileSystem.listStatus(ChecksumFileSystem.java:570)
      at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1485)
      at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1525)
      at org.apache.mahout.common.iterator.sequencefile.SequenceFileDirValueIterator.<init>(SequenceFileDirValueIterator.java:70)
      at org.apache.mahout.common.iterator.sequencefile.SequenceFileDirValueIterable.iterator(SequenceFileDirValueIterable.java:76)
      ... 10 more

      15/03/26 12:15:16 INFO mapreduce.Job: map 100% reduce 0%
      15/03/26 12:15:17 INFO mapreduce.Job: map 100% reduce 100%
      15/03/26 12:15:17 INFO mapreduce.Job: Job job_1426848955524_0062 failed with state FAILED due to: Task failed task_1426848955524_0062_m_000000
      Job failed as tasks failed. failedMaps:1 failedReduces:0

      15/03/26 12:15:17 INFO mapreduce.Job: Counters: 9
      Job Counters
      Failed map tasks=4
      Launched map tasks=4
      Other local map tasks=3
      Rack-local map tasks=1
      Total time spent by all maps in occupied slots (ms)=23087
      Total time spent by all reduces in occupied slots (ms)=0
      Total time spent by all map tasks (ms)=23087
      Total vcore-seconds taken by all map tasks=23087
      Total megabyte-seconds taken by all map tasks=23641088
      Exception in thread "main" java.lang.InterruptedException: Cluster Iteration 1 failed processing output/clusters-1
      at org.apache.mahout.clustering.iterator.ClusterIterator.iterateMR(ClusterIterator.java:183)
      at org.apache.mahout.clustering.kmeans.KMeansDriver.buildClusters(KMeansDriver.java:224)
      at org.apache.mahout.clustering.kmeans.KMeansDriver.run(KMeansDriver.java:147)
      at org.apache.mahout.clustering.syntheticcontrol.kmeans.Job.run(Job.java:135)
      at org.apache.mahout.clustering.syntheticcontrol.kmeans.Job.main(Job.java:60)
      at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
      at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
      at java.lang.reflect.Method.invoke(Method.java:606)
      at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:71)
      at org.apache.hadoop.util.ProgramDriver.run(ProgramDriver.java:144)
      at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:152)
      at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:195)
      at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
      at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
      at java.lang.reflect.Method.invoke(Method.java:606)
      at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
      at org.apache.hadoop.util.RunJar.main(RunJar.java:136)

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                andrew.musselman Andrew Musselman
                Reporter:
                hasonhai Ha Son Hai
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  Time Tracking

                  Estimated:
                  Original Estimate - 1h
                  1h
                  Remaining:
                  Remaining Estimate - 1h
                  1h
                  Logged:
                  Time Spent - Not Specified
                  Not Specified