Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-1582

Make TrieRange completely independent from Document/Field with TokenStream of prefix encoded values

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 2.9
    • Fix Version/s: 2.9
    • Component/s: modules/other
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      TrieRange has currently the following problem:

      • To add a field, that uses a trie encoding, you can manually add each term to the index or use a helper method from TrieUtils. The helper method has the problem, that it uses a fixed field configuration
      • TrieUtils currently creates per default a helper field containing the lower precision terms to enable sorting (limitation of one term/document for sorting)
      • trieCodeLong/Int() creates unnecessarily String[] and char[] arrays that is heavy for GC, if you index lot of numeric values. Also a lot of char[] to String copying is involved.

      This issue should improve this:

      • trieCodeLong/Int() returns a TokenStream. During encoding, all char[] arrays are reused by Token API, additional String[] arrays for the encoded result are not created, instead the TokenStream enumerates the trie values.
      • Trie fields can be added to Documents during indexing using the standard API: new Field(name,TokenStream,...), so no extra util method needed. By using token filters, one could also add payload and so and customize everything.

      The drawback is: Sorting would not work anymore. To enable sorting, a (sub-)issue can extend the FieldCache to stop iterating the terms, as soon as a lower precision one is enumerated by TermEnum. I will create a "hack" patch for TrieUtils-use only, that uses a non-checked Exceptionin the Parser to stop iteration. With LUCENE-831, a more generic API for this type can be used (custom parser/iterator implementation for FieldCache). I will attach the field cache patch (with the temporary solution, until FieldCache is reimplemented) as a separate patch file, or maybe open another issue for it.

        Attachments

        1. LUCENE-1582.patch
          78 kB
          Uwe Schindler
        2. LUCENE-1582.patch
          69 kB
          Uwe Schindler
        3. ASF.LICENSE.NOT.GRANTED--LUCENE-1582.patch
          67 kB
          Uwe Schindler
        4. ASF.LICENSE.NOT.GRANTED--LUCENE-1582.patch
          37 kB
          Uwe Schindler

          Issue Links

            Activity

              People

              • Assignee:
                thetaphi Uwe Schindler
                Reporter:
                thetaphi Uwe Schindler
              • Votes:
                1 Vote for this issue
                Watchers:
                0 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: