Uploaded image for project: 'Mahout'
  1. Mahout
  2. MAHOUT-317

Collocations: Eliminate in-memory frequency calculation

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 0.3
    • 0.3
    • None
    • None

    Description

      see: http://www.lucidimagination.com/search/document/ae484d53e969250e/who_owns_mahout_bucket_on_s3

      The collocation code currently uses maps in the CollocCombiner and CollocReducer to perform frequency calculations which can cause the process to exceed the heap space if a large number of ngrams exist for any given subgram.

      Convert the code to use a composite key / secondary sort to avoid the need for in-memory map for frequency calculations.

      Attachments

        1. MAHOUT-317.patch
          26 kB
          Drew Farris
        2. MAHOUT-317.patch
          35 kB
          Drew Farris
        3. MAHOUT-317.patch
          45 kB
          Drew Farris
        4. MAHOUT-317.patch
          45 kB
          Drew Farris

        Activity

          People

            drew.farris Drew Farris
            drew.farris Drew Farris
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: