Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-5966

How to migrate from numeric fields to auto-prefix terms

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Won't Fix
    • None
    • None
    • None
    • None
    • New

    Description

      In LUCENE-5879 we are adding auto-prefix terms to the default terms dict, which is generalized from numeric fields and offers faster performance while using less indexing space and about the same indexing time.

      But there are many users out there with indices already created containing numeric fields ... so ideally we have some simple way for such users to switch over to auto-prefix terms.

      Robert has a good plan (copied from LUCENE-5879):

      Here are some thoughts.

      1. keep current trie "Encoding" for terms, it just uses precision step=Inf and lets the term dictionary do it automatically.
      2. create a filteratomicreader, that for a previous trie encoded field, removes "fake" terms on merge.

      Users could continue to use NumericRangeQuery just with the infinite precision step, and it will always work, just execute slower for old segments as it doesnt take advantage of the trie terms that are not yet merged away.

      One issue to making it really nice, is that lucene doesnt know for sure that a field is numeric, so it cannot be "full-auto". Apps would have to use their schema or whatever to wrap with this reader in their merge policy.

      Maybe we could provide some sugar for this, such as a wrapping merge policy that takes a list of field names that are numeric, or sugar to pass this to IWC in IndexUpgrader to force it, and so on.

      Attachments

        Activity

          People

            Unassigned Unassigned
            mikemccand Michael McCandless
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: