Uploaded image for project: 'Mahout'
  1. Mahout
  2. MAHOUT-317

Collocations: Eliminate in-memory frequency calculation

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 0.3
    • Fix Version/s: 0.3
    • Component/s: None
    • Labels:
      None

      Description

      see: http://www.lucidimagination.com/search/document/ae484d53e969250e/who_owns_mahout_bucket_on_s3

      The collocation code currently uses maps in the CollocCombiner and CollocReducer to perform frequency calculations which can cause the process to exceed the heap space if a large number of ngrams exist for any given subgram.

      Convert the code to use a composite key / secondary sort to avoid the need for in-memory map for frequency calculations.

        Attachments

        1. MAHOUT-317.patch
          26 kB
          Drew Farris
        2. MAHOUT-317.patch
          35 kB
          Drew Farris
        3. MAHOUT-317.patch
          45 kB
          Drew Farris
        4. MAHOUT-317.patch
          45 kB
          Drew Farris

          Activity

            People

            • Assignee:
              drew.farris Drew Farris
              Reporter:
              drew.farris Drew Farris
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: