Mahout
  1. Mahout
  2. MAHOUT-747

Entropy implementation in Map/Reduce

    Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.6
    • Fix Version/s: 0.6
    • Component/s: Math
    • Labels:
      None

      Description

      Hi again,

      because I got much to work with entropy and information gain ratio, I want to implement the following distributed algorithms:

      This issue is at first only for entropy.

      Some questions:

      • In which package do the classes belong. I put them first at 'org.apache.mahout.math.stats', don't know if this is right, because they are components of information retrieval.
      • Entropy only reads a set of elements. As input i took a sequence file with keys of type Text and values anyone, because I only work with the keys. Is this the best practise?
      • Is there a generic solution, so that the type of keys can be anything inherited from Writable?

      In Hadoop is a TokenCounterMapper, which emits each value with an IntWritable(1). I added a KeyCounterMapper into 'org.apache.mahout.common.mapreduce' which does the same with the keys.

      Will append my patch soon.

      Regards, Christoph.

      1. MAHOUT-747.patch
        43 kB
        Christoph Nagel

        Activity

        Sean Owen made changes -
        Status Resolved [ 5 ] Closed [ 6 ]
        Sean Owen made changes -
        Status Open [ 1 ] Resolved [ 5 ]
        Assignee Sean Owen [ srowen ]
        Fix Version/s 0.6 [ 12316364 ]
        Resolution Fixed [ 1 ]
        Christoph Nagel made changes -
        Attachment MAHOUT-747.patch [ 12485162 ]
        Christoph Nagel made changes -
        Attachment MAHOUT-747.patch [ 12484876 ]
        Christoph Nagel made changes -
        Attachment MAHOUT-747.patch [ 12484876 ]
        Christoph Nagel made changes -
        Attachment MAHOUT-747.patch [ 12484615 ]
        Christoph Nagel made changes -
        Attachment MAHOUT-747.patch [ 12484615 ]
        Christoph Nagel made changes -
        Status Patch Available [ 10002 ] Open [ 1 ]
        Christoph Nagel made changes -
        Field Original Value New Value
        Status Open [ 1 ] Patch Available [ 10002 ]
        Christoph Nagel created issue -

          People

          • Assignee:
            Sean Owen
            Reporter:
            Christoph Nagel
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development