Lucene - Core
  1. Lucene - Core
  2. LUCENE-5750

Speed up monotonic address access in BINARY/SORTED_SET

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 4.9, Trunk
    • Component/s: None
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      I found this while exploring LUCENE-5748, but it currently applies to both variable length BINARY and SORTED_SET, so I think its worth it to do here first.

      I think its just a holdover from before MonotonicBlockPackedWriter that to access element N we currently do:

      startOffset = (docID == 0 ? 0 : ordIndex.get(docID-1));
      endOffset = ordIndex.get(docID);
      

      Thats because previously we didnt have packed ints that supported > Integer.MAX_VALUE elements. But thats been fixed for a long time. If we just write a 0 first and do this:

      startOffset = ordIndex.get(docID);
      endOffset = ordIndex.get(docID+1);
      

      The access is then much faster. For sorting i see around 20% improvement. We don't lose any compression because we should assume the delta from 0 .. 1 is similar to any other gap N .. N+1

      1. LUCENE-5750.patch
        4 kB
        Robert Muir
      2. LUCENE-5750.patch
        4 kB
        Robert Muir

        Activity

        Hide
        ASF subversion and git services added a comment -

        Commit 1601755 from Robert Muir in branch 'dev/branches/branch_4x'
        [ https://svn.apache.org/r1601755 ]

        LUCENE-5750: speed up monotonic address in BINARY/SORTED_SET

        Show
        ASF subversion and git services added a comment - Commit 1601755 from Robert Muir in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1601755 ] LUCENE-5750 : speed up monotonic address in BINARY/SORTED_SET
        Hide
        ASF subversion and git services added a comment -

        Commit 1601750 from Robert Muir in branch 'dev/trunk'
        [ https://svn.apache.org/r1601750 ]

        LUCENE-5750: speed up monotonic address in BINARY/SORTED_SET

        Show
        ASF subversion and git services added a comment - Commit 1601750 from Robert Muir in branch 'dev/trunk' [ https://svn.apache.org/r1601750 ] LUCENE-5750 : speed up monotonic address in BINARY/SORTED_SET
        Hide
        Adrien Grand added a comment -

        + 1

        Show
        Adrien Grand added a comment - + 1
        Hide
        Robert Muir added a comment -

        add +1L to the SORTED_SET case (its special and takes 'int' docid versus BINARY which already uses long addressing)

        Show
        Robert Muir added a comment - add +1L to the SORTED_SET case (its special and takes 'int' docid versus BINARY which already uses long addressing)
        Hide
        Michael McCandless added a comment -

        +1

        Show
        Michael McCandless added a comment - +1
        Hide
        Robert Muir added a comment -

        patch (we have a new DV format for 4.9 so its a good time to fix it)

        Show
        Robert Muir added a comment - patch (we have a new DV format for 4.9 so its a good time to fix it)

          People

          • Assignee:
            Unassigned
            Reporter:
            Robert Muir
          • Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development