Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.3, 0.4, 0.5
    • Fix Version/s: 0.6
    • Component/s: None

      Description

      Collocations generated using Mahout could be used to form a whitelist of terms to index into a Lucene index. This patch will provide a way to generate a serialized BloomFilter from CollocationsOutput and a Lucene filter that will take a BloomFilter and emit tokens that are members of that filter. This would allow a set of interesting collocations to be pre-computed for a corpus and then allow the documents to be indexed using only those collocations.

        Activity

        No work has yet been logged on this issue.

          People

          • Assignee:
            Drew Farris
            Reporter:
            Drew Farris
          • Votes:
            2 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development