Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-7460

Should SortedNumericDocValues expose a per-document random-access API?

    Details

    • Type: Wish
    • Status: Resolved
    • Priority: Minor
    • Resolution: Won't Fix
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      Sorted numerics used to expose a per-document random-access API so that accessing the median or max element would be cheap. The new SortedNumericDocValues still exposes the number of values a document has, but the only way to read values is to use

      {nextValue}

      , which forces to read all values in order to read the max value.

      For instance, SortedNumericSelector.MAX does the following in master (the important part is the for-loop):

          private void setValue() throws IOException {
            int count = in.docValueCount();
            for(int i=0;i<count;i++) {
              value = in.nextValue();
            }
          }
      
          @Override
          public int nextDoc() throws IOException {
            int docID = in.nextDoc();
            if (docID != NO_MORE_DOCS) {
              setValue();
            }
            return docID;
          }
      

      while it used to simply look up the value at index count-1 in 6.x:

          @Override
          public long get(int docID) {
            in.setDocument(docID);
            final int count = in.count();
            if (count == 0) {
              return 0; // missing
            } else {
              return in.valueAt(count-1);
            }
          }
      

      This could be a conscious decision since a sequential API gives more opportunities to the codec to compress efficiently, but on the other hand this API prevents sorting by max or median values to be efficient.

      On my end I have a preference for the random-access API.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                jpountz Adrien Grand
              • Votes:
                0 Vote for this issue
                Watchers:
                4 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: