Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-9051

Implement random access seeks in IndexedDISI (DocValues)


    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • None
    • None
    • New


      In LUCENE-9004 we have a use case for random-access seeking in DocValues, which currently only support forward-only iteration (with efficient skipping). One idea there was to write an entirely new format to cover these cases. While looking into that, I noticed that our current DocValues addressing implementation, IndexedDISI, already has a pretty good basis for providing random accesses. I worked up a patch that does that; we already have the ability to jump to a block, thanks to the jump-tables added last year by toke; the patch uses that, and/or rewinds the iteration within current block as needed.

      I did a very simple performance test, comparing forward-only iteration with random seeks, and in my test I saw no difference, but that can't be right, so I wonder if we have a more thorough performance test of DocValues somwhere that I could repurpose. Probably I'll go back and dig into the issue where we added the jump tables - I seem to recall some testing was done then.

      Aside from performance testing the implementation, there is the question should we alter our API guarantees in this way. This might be controversial, I don't know the history or all the reasoning behind the way it is today. We provide advanceExact and some implementations support docids going backwards, others don't.  AssertingNumericDocValues.advanceExact does  enforce forward-iteration (in tests); what would the consequence be of relaxing that? We'd then open ourselves up to requiring all DV impls to support random access. Are there other impls to worry about though? I'm not sure. I'd appreciate y'all's input on this one.


        Issue Links



              Unassigned Unassigned
              sokolov Michael Sokolov
              0 Vote for this issue
              2 Start watching this issue



                Time Tracking

                  Original Estimate - Not Specified
                  Not Specified
                  Remaining Estimate - 0h
                  Time Spent - 10m