Uploaded image for project: 'Mahout'
  1. Mahout
  2. MAHOUT-326

a possible bug with the isConverged() method in KMeansDriver.java

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 0.2
    • 0.4
    • classic
    • None

    Description

      In one of my today's test runs using the clustering example from the book "Mahout in Action", I noticed the following exception thrown by KMeansClusterMapper:

      ----------------------------
      java.lang.RuntimeException: Error in configuring object at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:354) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307) at org.apache.hadoop.mapred.Child.main(Child.java:159) Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88) ... 5 more Caused by: java.lang.RuntimeException: Error in configuring object at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117) at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:34) ... 10 more Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at

      ***

      org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88) ... 13 more Caused by: java.lang.NullPointerException: Cluster is empty!!! at

      ***

      org.apache.mahout.clustering.kmeans.KMeansClusterMapper.configure(KMeansClusterMapper.java:63)
      ---------------------------

      which says that the runClustering method didn't see the cluster ouput. The same map task did finally succeed after a few failed attempts.

      After looking into KMeansDirver.java, I think may be a bug in the isConverged method. Basically, this method doesn't wait for the cluster output file to be fully populated. If the part-* file doesn't exist yet or has not been fully written, then this method can return true prematurally. I am not sure if this is a bug of hadoop itself because it may report successful job before the mapred output file is fully written. Meanwhile, a possible way to fix this problem is to force the isConverged method to wait for the existence of the cluster output file and make sure the file contains the 'converged' values for all the clusters.

      Please note, I saw this problem only once in many test runs I had so far. It may be a little bit difficult to reproduce. If you need any further information, please let me know.

      Thanks.

      Attachments

        1. mahout_bug.png
          146 kB
          Chad Chen

        Activity

          People

            jeastman Jeff Eastman
            wc2010 Chad Chen
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: