Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-7989

Add computed (at segment flush) doc values fields

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Won't Fix
    • None
    • None
    • None
    • None
    • New

    Description

      This is a failed experiment but I thought I'd open an issue and post the patch in case it inspires others.

      It adds a new feature to Lucene, which lets you provide function (set via IndexWriterConfig) that is invoked at segment flush time to create a new doc values field as a function of all other doc values fields in that segment. The newly created field is "first class", i.e. behaves as if you had indexed actual doc values fields on your documents, it can participate in index sort, etc. The interesting thing about it is it has access to all other documents that made it into the flushed segment (by pulling doc values iterators for it).

      Anyway, I got the feature working, and it's surprisingly small core code change, but I had a very specific use case in mind, to "coalesce" documents by their families while sorting them by another field, and I realized that even though the feature is working, I cannot use it for this particular use case since the coalescing would break during merge (it's not just a simple "merge sort"). The test case I added, simulating my use case, fails on those seeds / test multipliers that trigger merging of the random index.

      I'll post a patch but I don't plan to push this any further!

      Attachments

        1. LUCENE-7989.patch
          47 kB
          Michael McCandless

        Activity

          People

            Unassigned Unassigned
            mikemccand Michael McCandless
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: