Lucene - Core
  1. Lucene - Core
  2. LUCENE-2531

FieldComparator.TermOrdValComparator compares by value unnecessarily

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 3.1, 4.0-ALPHA
    • Component/s: core/search
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      Digging on LUCENE-2504, I noticed that TermOrdValComparator's compareBottom method falls back on compare-by-value when it needn't.

      Specifically, if we know the current bottom ord "matches" the current segment, we can skip the value comparison when the ords are the same (ie, return 0) because the ords are exactly comparable.

      This is hurting string sort perf especially for optimized indices (and also unoptimized indices), and especially for highly redundant (not many unique values) fields. This affects all releases >= 2.9.x, but trunk is likely more severely affected since looking up a value is more costly.

      1. LUCENE-2504.patch
        3 kB
        Michael McCandless
      2. LUCENE-2504-3x.patch
        3 kB
        Michael McCandless
      3. LUCENE-2531.patch
        5 kB
        Michael McCandless
      4. LUCENE-2531-3x.patch
        4 kB
        Michael McCandless

        Activity

        Hide
        Michael McCandless added a comment -

        Attached patches, for 3.1 and 4.0.

        This also cleans up the String/TermOrdValComparator, absorbing the convert method directly into setBottom, and doing away with some dead code. With this fix we no longer compare by value at all if index is a single segment (I think!).

        Show
        Michael McCandless added a comment - Attached patches, for 3.1 and 4.0. This also cleans up the String/TermOrdValComparator, absorbing the convert method directly into setBottom, and doing away with some dead code. With this fix we no longer compare by value at all if index is a single segment (I think!).
        Hide
        Michael McCandless added a comment -

        Another iteration on the patches with some small further optimizing.

        Show
        Michael McCandless added a comment - Another iteration on the patches with some small further optimizing.
        Hide
        Yonik Seeley added a comment -

        The thing that jumps out at me is the removal of "reversed"... I'd need to apply the patch to study enough to figure that out, but any clues you could give would be helpful (yes, I'm being lazy

        Show
        Yonik Seeley added a comment - The thing that jumps out at me is the removal of "reversed"... I'd need to apply the patch to study enough to figure that out, but any clues you could give would be helpful (yes, I'm being lazy
        Hide
        Michael McCandless added a comment -

        I hear you; this stuff is kinda hairy

        The usage of reverse (and, sortPos) was only in dead code, inside the convert method.

        The convert method used to be used more often (to do on-demand convert of the ord when its readerGen != current one), but we stopped doing that and only convert the bottom slot now, which mean this if:

              if (sortPos == 0 && bottomSlot != -1 && bottomSlot != slot) {
        

        Can never be true (bottomSlot is always == slot), so that true clause was all dead code anyway.

        Show
        Michael McCandless added a comment - I hear you; this stuff is kinda hairy The usage of reverse (and, sortPos) was only in dead code, inside the convert method. The convert method used to be used more often (to do on-demand convert of the ord when its readerGen != current one), but we stopped doing that and only convert the bottom slot now, which mean this if: if (sortPos == 0 && bottomSlot != -1 && bottomSlot != slot) { Can never be true (bottomSlot is always == slot), so that true clause was all dead code anyway.
        Hide
        Grant Ingersoll added a comment -

        Bulk close for 3.1

        Show
        Grant Ingersoll added a comment - Bulk close for 3.1

          People

          • Assignee:
            Unassigned
            Reporter:
            Michael McCandless
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development