Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-5879

Add auto-prefix terms to block tree terms dict

Details

    • New Feature
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 5.2, 6.0
    • core/codecs
    • None
    • New

    Description

      This cool idea to generalize numeric/trie fields came from Adrien:

      Today, when we index a numeric field (LongField, etc.) we pre-compute
      (via NumericTokenStream) outside of indexer/codec which prefix terms
      should be indexed.

      But this can be inefficient: you set a static precisionStep, and
      always add those prefix terms regardless of how the terms in the field
      are actually distributed. Yet typically in real world applications
      the terms have a non-random distribution.

      So, it should be better if instead the terms dict decides where it
      makes sense to insert prefix terms, based on how dense the terms are
      in each region of term space.

      This way we can speed up query time for both term (e.g. infix
      suggester) and numeric ranges, and it should let us use less index
      space and get faster range queries.

      This would also mean that min/maxTerm for a numeric field would now be
      correct, vs today where the externally computed prefix terms are
      placed after the full precision terms, causing hairy code like
      NumericUtils.getMaxInt/Long. So optos like LUCENE-5860 become
      feasible.

      The terms dict can also do tricks not possible if you must live on top
      of its APIs, e.g. to handle the adversary/over-constrained case when a
      given prefix has too many terms following it but finer prefixes
      have too few (what block tree calls "floor term blocks").

      Attachments

        1. LUCENE-5879.patch
          327 kB
          Michael McCandless
        2. LUCENE-5879.patch
          226 kB
          Michael McCandless
        3. LUCENE-5879.patch
          225 kB
          Michael McCandless
        4. LUCENE-5879.patch
          220 kB
          Michael McCandless
        5. LUCENE-5879.patch
          218 kB
          Michael McCandless
        6. LUCENE-5879.patch
          292 kB
          Michael McCandless
        7. LUCENE-5879.patch
          281 kB
          Michael McCandless
        8. LUCENE-5879.patch
          279 kB
          Michael McCandless
        9. LUCENE-5879.patch
          268 kB
          Michael McCandless
        10. LUCENE-5879.patch
          244 kB
          Michael McCandless
        11. LUCENE-5879.patch
          217 kB
          Michael McCandless
        12. LUCENE-5879.patch
          213 kB
          Michael McCandless
        13. LUCENE-5879.patch
          149 kB
          Michael McCandless
        14. LUCENE-5879.patch
          133 kB
          Michael McCandless

        Issue Links

          Activity

            People

              mikemccand Michael McCandless
              mikemccand Michael McCandless
              Votes:
              2 Vote for this issue
              Watchers:
              13 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Slack

                  Issue deployment