Uploaded image for project: 'Mahout'
  1. Mahout
  2. MAHOUT-415

Lucene filter for Collocations

    XMLWordPrintableJSON

    Details

    • Type: New Feature
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 0.3, 0.4, 0.5
    • Fix Version/s: 0.6
    • Component/s: None

      Description

      Collocations generated using Mahout could be used to form a whitelist of terms to index into a Lucene index. This patch will provide a way to generate a serialized BloomFilter from CollocationsOutput and a Lucene filter that will take a BloomFilter and emit tokens that are members of that filter. This would allow a set of interesting collocations to be pre-computed for a corpus and then allow the documents to be indexed using only those collocations.

        Attachments

        1. MAHOUT-415.patch
          9 kB
          Drew Farris

          Activity

            People

            • Assignee:
              drew.farris Drew Farris
              Reporter:
              drew.farris Drew Farris
            • Votes:
              2 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: