Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-5729

explore random-access methods to IndexInput

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Duplicate
    • None
    • None
    • None
    • None
    • New

    Description

      Traditionally lucene access is mostly reading lists of postings and geared at that, but for random-access stuff like docvalues, it just creates overhead.

      So today we are hacking around it, by doing this random access with seek+readXXX, but this is inefficient (additional checks by the jdk that we dont need).

      As a hack, I added the following to IndexInput, changed direct packed ints decode to use them, and implemented in MMapDir:

      byte readByte(long pos) --> ByteBuffer.get(pos)
      short readShort(long pos) --> ByteBuffer.getShort(pos)
      int readInt(long pos) --> ByteBuffer.getInt(pos)
      long readLong(long pos) --> ByteBuffer.getLong(pos)
      

      This gives ~30% performance improvement for docvalues (numerics, sorting strings, etc)

      We should do a few things first before working this (LUCENE-5728: use slice api in decode, pad packed ints so we only have one i/o call ever, etc etc) but I think we need to figure out such an API.

      It could either be on indexinput like my hack (this is similar to ByteBuffer API with both relative and absolute methods), or we could have a separate API. But i guess arguably IOContext exists to supply hints too, so I dont know which is the way to go.

      Attachments

        Activity

          People

            Unassigned Unassigned
            rcmuir Robert Muir
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: