Uploaded image for project: 'Mahout'
  1. Mahout
  2. MAHOUT-1328

CLI-invoked K-means final step (Cluster Classification Driver) ignores job-specific -D MR parameters

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Not A Problem
    • Affects Version/s: 0.8
    • Fix Version/s: 0.8
    • Component/s: Clustering
    • Labels:
      None

      Description

      I believe this is an issue - someone please correct me if not!

      I am running a large k-means clustering task. Our default cluster map/reduce slots per node and JVM memory parameters etc are not appropriate for the memory requirements of this.

      So, I invoke K-means clustering from the CLI using, for example:

      mahout kmeans -i /mahout-input -o /mahout-output -c clusters -dm org.apache.mahout.common.distance.CosineDistanceMeasure -x 12 -ow -k 50 -cl -Dmapred.child.java.opts=-Xmx7096m -Dmapred.tasktracker.reduce.tasks.maximum=1 -Dmapred.tasktracker.map.tasks.maximum=1 -Dmapred.job.map.memory.mb=7000 -Dmapred.cluster.max.map.memory.mb=7000 -Dmapred.cluster.reduce.memory.mb=7000 -Dmapred.cluster.max.reduce.memory.mb=7000

      The initial MR tasks for each clustering iteration run successfully. Inspecting the Hadoop config for each task after completion show that the job runs with the explicitly provided MR configuration from the -D parameters.

      However, when the final cluster classification task is run (i.e. to generate the clusteredPoints/ directory), it usually fails due to outOfMemory errors. Inspecting the MR task logs for it shows that it ran with the default cluster settings, not those provided by my -D CLI parameters.

        Attachments

          Activity

            People

            • Assignee:
              smarthi Suneel Marthi
              Reporter:
              stewh-uk Stewart Whiting
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: