Uploaded image for project: 'Mahout'
  1. Mahout
  2. MAHOUT-1431

Comparison of Mahout 0.8 vs mahout 0.9 in EMR

    XMLWordPrintableJSON

    Details

    • Type: Question
    • Status: Closed
    • Priority: Major
    • Resolution: Cannot Reproduce
    • Affects Version/s: 0.8, 0.9
    • Fix Version/s: 0.10.0
    • Component/s: Clustering
    • Labels:

      Description

      Hi all,
      i tested mahout 0.8 and 0.9 in mahout emr with a large dataset as input and
      i performed kmeans experiments with both versions in amazon EMR.
      What i found is that mahout 0.8 is faster than mahout 0.9
      in particular i observed that mahout 0.8 is performing less iterations and every iteration of kmeans is faster than mahout 0.9.Every iteration in mahout 0.8 is twice as fast as that of 0.9
      the hadoop version was 1.0.x and the input of the data was roughly 2 million datapoints with dimensionality of 1800.
      The input parameters in both experiments were exactly the same,modulo the initialization which was random in both cases and i can understand that this may affect the convergence(the amount of iterations),but i am baffled by the fact that every iteration takes almost twice the time in 0.9 vs 0.8

      Is this normal?is this expected?

      thank you in advance for your time.

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              yannis_at yannis ats
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: