Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-10220

Add an utility method to get IntervalSource from analyzed text (or token stream)

Details

    • Improvement
    • Status: Closed
    • Minor
    • Resolution: Fixed
    • None
    • 9.1
    • None
    • None
    • New

    Description

      The Intervals has a number of utility methods that provide an IntervalSource for tokens, phrases, etc. But it's missing an important bit: an interval source matching tokens that are a result of some string applied to a full analysis chain. This corresponds to actually resides in the index and is hard to predict from the outside.

      This is an important omission in Intervals as a utility class.

      I borrowed the implementation from the then-ASL-licensed Elasticsearch code at: 

      https://github.com/elastic/elasticsearch/blob/7.10/server/src/main/java/org/elasticsearch/index/query/IntervalBuilder.java#L54-L106

      I also modified it slightly to fit the static-method-based Lucene API. I also added a small test that showcases how this method can be used in practice (and why it's hard to accomplish the same result with existing methods).

      The only thing I'm not sure is how to attribute Elasticsearch properly - in the notice file, perhaps?

      Attachments

        Issue Links

          Activity

            People

              dweiss Dawid Weiss
              dweiss Dawid Weiss
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 1h
                  1h