Uploaded image for project: 'Mahout'
  1. Mahout
  2. MAHOUT-401

Use NamedVector in seq2sparse

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 0.4
    • 0.4
    • classic
    • None

    Description

      In seq2sparse, TFIDFPartialVectorReducer and TFPartialVectorReducer should write NamedVectors. It appears that a lack of labels on the vector input to k-means at least breaks the cluster-dumper in the sense that it no longer prints the original document ids for points.

      See: http://lucene.472066.n3.nabble.com/where-are-the-points-in-each-cluster-kmeans-clusterdump-td838683.html#a845600

      I wonder if this is also an issue with the code that generates vectors from lucene indexes?

      Attachments

        1. MAHOUT-401.patch
          30 kB
          Drew Farris
        2. MAHOUT-401.patch
          4 kB
          Drew Farris
        3. pv.patch
          3 kB
          Drew Farris

        Activity

          People

            drew.farris Drew Farris
            drew.farris Drew Farris
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: