Uploaded image for project: 'Mahout'
  1. Mahout
  2. MAHOUT-1549

Extracting tfidf-vectors by key

    XMLWordPrintableJSON

Details

    • Question
    • Status: Closed
    • Major
    • Resolution: Done
    • 0.7, 0.8, 0.9
    • 0.10.0
    • None

    Description

      Hi,
      I have about 200000 tfidf-vectors and I need to extract 500 of them of which I have the keys. Is there some kind of magical option which allows me something like taking the output of mahout seqdumper and transform it back into a sequencefile that I can use for trainnb /testnb? The sequencefiles of tfidf use the Text class for the keys and the VectorWritable class for the values. I tried
      https://github.com/kevinweil/elephant-bird/blob/master/pig/src/main/java/com/twitter/elephantbird/pig/store/SequenceFileStorage.java
      with different settings but the output always gives me the Text class for both, key and value which can't be used in trainnb and testnb.

      I posted this question on:

      http://stackoverflow.com/questions/23502362/extracting-tfidf-vectors-by-key-without-destroying-the-fileformat

      I ask this question in here because I've seen similar questions on stackoverflow that where asked last year and still didn't get an answer

      I really need this information so in case you know anything please tell me.

      Regards,
      Richard

      Attachments

        Activity

          People

            Unassigned Unassigned
            Pilgrim Richard Scharrer
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: