Mahout
  1. Mahout
  2. MAHOUT-279

Make RandomSeedGenerator a M/R Job

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Later
    • Affects Version/s: 0.3
    • Fix Version/s: None
    • Component/s: Clustering
    • Labels:
      None

      Description

      Speedup Random Centroid Selection for clustering using Map/Reduce

      Increasing the scope of this issue.

      • Random Seed Generator could take a distance measure and a threshold and use that information during random eviction and insertion to increase the distance between two centroids

        Activity

        Robin Anil created issue -
        Robin Anil made changes -
        Field Original Value New Value
        Summary Make RandomSeedGenerate a M/R Job Make RandomSeedGenerator a M/R Job
        Hide
        Sean Owen added a comment -

        Here's a different suggestion. The problem is efficiently picking a couple vectors out of billions. An M/R seems like such overkill.

        This patch just picks random points in the file, syncs, and reads. Unless the underlying implementation is awful, this should be super fast. The downside is the choice is slightly biased. We could fix that if needed.

        I don't know if this works, is there a way to test reading on real input?

        Show
        Sean Owen added a comment - Here's a different suggestion. The problem is efficiently picking a couple vectors out of billions. An M/R seems like such overkill. This patch just picks random points in the file, syncs, and reads. Unless the underlying implementation is awful, this should be super fast. The downside is the choice is slightly biased. We could fix that if needed. I don't know if this works, is there a way to test reading on real input?
        Sean Owen made changes -
        Attachment MAHOUT-279.patch [ 12435139 ]
        Sean Owen made changes -
        Fix Version/s 0.4 [ 12314396 ]
        Fix Version/s 0.3 [ 12314281 ]
        Hide
        Sean Owen added a comment -

        Bah, it doesn't actually work in Hadoop, for reasons I don't quite get. Nevermind.

        Show
        Sean Owen added a comment - Bah, it doesn't actually work in Hadoop, for reasons I don't quite get. Nevermind.
        Robin Anil made changes -
        Description Speedup Random Centroid Selection for clustering using Map/Reduce Speedup Random Centroid Selection for clustering using Map/Reduce

        Increasing the scope of this issue.

        * Random Seed Generator could take a distance measure and a threshold and use that information during random eviction and insertion to increase the distance between two centroids
        Hide
        Ted Dunning added a comment -

        Is this overlapping with the k-means++ stuff?

        Show
        Ted Dunning added a comment - Is this overlapping with the k-means++ stuff?
        Hide
        Sean Owen added a comment -

        Am I right that this has stalled out, not for 0.4 at least?

        Show
        Sean Owen added a comment - Am I right that this has stalled out, not for 0.4 at least?
        Sean Owen made changes -
        Fix Version/s 0.4 [ 12314396 ]
        Priority Major [ 3 ] Minor [ 4 ]
        Hide
        Ted Dunning added a comment -

        Seems right to me (not for 0.4, that is).

        Show
        Ted Dunning added a comment - Seems right to me (not for 0.4, that is).
        Hide
        Jeff Eastman added a comment -

        Moving this from limbo to 0.5

        Show
        Jeff Eastman added a comment - Moving this from limbo to 0.5
        Jeff Eastman made changes -
        Fix Version/s 0.5 [ 12315255 ]
        Hide
        Sean Owen added a comment -

        What's the thinking here – a good use case for it, patch should be cleaned up? or is this no longer interesting?

        Show
        Sean Owen added a comment - What's the thinking here – a good use case for it, patch should be cleaned up? or is this no longer interesting?
        Hide
        Sean Owen added a comment -

        Am I right that this one is dead?

        Show
        Sean Owen added a comment - Am I right that this one is dead?
        Sean Owen made changes -
        Assignee Robin Anil [ robinanil ]
        Fix Version/s 0.5 [ 12315255 ]
        Sean Owen made changes -
        Status Open [ 1 ] Resolved [ 5 ]
        Resolution Later [ 7 ]
        Sean Owen made changes -
        Status Resolved [ 5 ] Closed [ 6 ]

          People

          • Assignee:
            Robin Anil
            Reporter:
            Robin Anil
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development