Uploaded image for project: 'Mahout'
  1. Mahout
  2. MAHOUT-415

Lucene filter for Collocations

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 0.3, 0.4, 0.5
    • 0.6
    • None

    Description

      Collocations generated using Mahout could be used to form a whitelist of terms to index into a Lucene index. This patch will provide a way to generate a serialized BloomFilter from CollocationsOutput and a Lucene filter that will take a BloomFilter and emit tokens that are members of that filter. This would allow a set of interesting collocations to be pre-computed for a corpus and then allow the documents to be indexed using only those collocations.

      Attachments

        1. MAHOUT-415.patch
          9 kB
          Drew Farris

        Activity

          People

            drew.farris Drew Farris
            drew.farris Drew Farris
            Votes:
            2 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: