Details

    • New Feature
    • Status: Open
    • Minor
    • Resolution: Unresolved
    • 3.1, 4.0-ALPHA
    • None
    • modules/analysis
    • None
    • New

    Description

      The co-occurrence filter to be developed here will output sets of tokens that co-occur within a given window onto a token stream.

      These token sets can be ordered either lexically (to allow order-independent matching/counting) or positionally (e.g. sliding windows of positionally ordered co-occurring terms that include all terms in the window are called n-grams or shingles).

      The parameters to this filter will be:

      • window size: this can be a fixed sequence length, sentence/paragraph context (these will require sentence/paragraph segmentation, which is not in Lucene yet), or over the entire token stream (full field width)
      • minimum number of co-occurring terms: >= 2
      • maximum number of co-occurring terms: <= window size
      • token set ordering (lexical or positional)

      One use case for co-occurring token sets is as candidates for collocations.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              sarowe Steven Rowe
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated: