Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-1320

ShingleMatrixFilter, a three dimensional permutating shingle filter

Details

    • New Feature
    • Status: Closed
    • Blocker
    • Resolution: Fixed
    • 2.3.2
    • 2.4
    • modules/analysis
    • None
    • Patch Available

    Description

      Backed by a column focused matrix that creates all permutations of shingle tokens in three dimensions. I.e. it handles multi token synonyms.

      Could for instance in some cases be used to replaces 0-slop phrase queries with something speedier.

      Token[][][]{
        {{hello}, {greetings, and, salutations}},
        {{world}, {earth}, {tellus}}
      }
      

      passes the following test with 2-3 grams:

      assertNext(ts, "hello_world");
      assertNext(ts, "greetings_and");
      assertNext(ts, "greetings_and_salutations");
      assertNext(ts, "and_salutations");
      assertNext(ts, "and_salutations_world");
      assertNext(ts, "salutations_world");
      assertNext(ts, "hello_earth");
      assertNext(ts, "and_salutations_earth");
      assertNext(ts, "salutations_earth");
      assertNext(ts, "hello_tellus");
      assertNext(ts, "and_salutations_tellus");
      assertNext(ts, "salutations_tellus");
      

      Contains more and less complex tests that demonstrate offsets, posincr, payload boosts calculation and construction of a matrix from a token stream.

      The matrix attempts to hog as little memory as possible by seeking no more than maximumShingleSize columns forward in the stream and clearing up unused resources (columns and unique token sets). Can still be optimized quite a bit though.

      Attachments

        1. LUCENE-1320.txt
          50 kB
          Karl Wettin
        2. LUCENE-1320.txt
          59 kB
          Karl Wettin
        3. LUCENE-1320.txt
          70 kB
          Karl Wettin
        4. LUCENE-1320.patch
          22 kB
          Grant Ingersoll

        Activity

          People

            karl.wettin Karl Wettin
            karl.wettin Karl Wettin
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: