Uploaded image for project: 'Mahout'
  1. Mahout
  2. MAHOUT-415

Lucene filter for Collocations

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 0.3, 0.4, 0.5
    • 0.6
    • None

    Description

      Collocations generated using Mahout could be used to form a whitelist of terms to index into a Lucene index. This patch will provide a way to generate a serialized BloomFilter from CollocationsOutput and a Lucene filter that will take a BloomFilter and emit tokens that are members of that filter. This would allow a set of interesting collocations to be pre-computed for a corpus and then allow the documents to be indexed using only those collocations.

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            drew.farris Drew Farris
            drew.farris Drew Farris
            Votes:
            2 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Issue deployment