Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 4.9, 6.0
    • Component/s: None
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      Currently there is an oversharing problem in packedints that imposes too many requirements on improving it:

      • every packed ints must be able to be loaded directly, or in ram, or iterated with.
      • things like filepointers are expected to be adjusted (this is especially stupid) in all cases
      • lots of unnecessary abstractions
      • versioning etc is complex

      None of this flexibility is needed or buys us anything, and it prevents performance improvements (e.g. i just want to add 3 bytes at the end of on-disk streams to reduce the number of bytebuffer calls and thats seriously impossible with the current situation).

      1. LUCENE-5731.patch
        204 kB
        Robert Muir
      2. LUCENE-5731.patch
        204 kB
        Robert Muir

        Activity

        Hide
        Robert Muir added a comment -

        Related: the in-ram stuff is also complex, and has tons of generated code.

        For the postings lists mike and I experimented with a much simpler approach here: https://github.com/rmuir/lucene-solr/tree/packypack

        It gives speedups (especially for positions with higher bpv), with 800 lines of total code (https://github.com/rmuir/lucene-solr/blob/packypack/lucene/core/src/java/org/apache/lucene/util/lightpacked/SimplePackedInts.java) versus the huge size bloat we have today. So I think the in-ram stuff can use a touchup as well, but we dont need to tackle that here.

        Show
        Robert Muir added a comment - Related: the in-ram stuff is also complex, and has tons of generated code. For the postings lists mike and I experimented with a much simpler approach here: https://github.com/rmuir/lucene-solr/tree/packypack It gives speedups (especially for positions with higher bpv), with 800 lines of total code ( https://github.com/rmuir/lucene-solr/blob/packypack/lucene/core/src/java/org/apache/lucene/util/lightpacked/SimplePackedInts.java ) versus the huge size bloat we have today. So I think the in-ram stuff can use a touchup as well, but we dont need to tackle that here.
        Hide
        Robert Muir added a comment -

        Attached is a patch:

        • added new DirectWriter, DirectReader. They support > 2B values and don't have concepts like 'acceptableOverhead', instead its just simple and ensures every bpv is fast.
        • added RandomAccessInput api (default -> seek+read), with optimized impl for mmap.
        • Added 3 byte padding to the end of every DirectWriter stream, all decoding is one i/o operation.
        • DirectReader enforces its use
        • Added new Lucene49DocValuesFormat using this stuff.

        Across every bitsPerValue i see consistent performance gains, usually 50-75% from trunk today.

        Show
        Robert Muir added a comment - Attached is a patch: added new DirectWriter, DirectReader. They support > 2B values and don't have concepts like 'acceptableOverhead', instead its just simple and ensures every bpv is fast. added RandomAccessInput api (default -> seek+read), with optimized impl for mmap. Added 3 byte padding to the end of every DirectWriter stream, all decoding is one i/o operation. DirectReader enforces its use Added new Lucene49DocValuesFormat using this stuff. Across every bitsPerValue i see consistent performance gains, usually 50-75% from trunk today.
        Hide
        Robert Muir added a comment -

        just some bugfixes to the mmap stuff. I need to add dedicated tests for those tomorrow.

        Show
        Robert Muir added a comment - just some bugfixes to the mmap stuff. I need to add dedicated tests for those tomorrow.
        Hide
        Adrien Grand added a comment -

        +1 I like the new directory API and how direct packed ints use it. One minor note: the javadoc of Lucene49Codec refers to the lucene46 package instead of lucene49.

        Show
        Adrien Grand added a comment - +1 I like the new directory API and how direct packed ints use it. One minor note: the javadoc of Lucene49Codec refers to the lucene46 package instead of lucene49.
        Hide
        Michael McCandless added a comment -

        +1, this looks really nice.

        Show
        Michael McCandless added a comment - +1, this looks really nice.
        Hide
        ASF subversion and git services added a comment -

        Commit 1600412 from Robert Muir in branch 'dev/trunk'
        [ https://svn.apache.org/r1600412 ]

        LUCENE-5731: split out direct packed ints from in-ram ones

        Show
        ASF subversion and git services added a comment - Commit 1600412 from Robert Muir in branch 'dev/trunk' [ https://svn.apache.org/r1600412 ] LUCENE-5731 : split out direct packed ints from in-ram ones
        Hide
        ASF subversion and git services added a comment -

        Commit 1600423 from Robert Muir in branch 'dev/branches/branch_4x'
        [ https://svn.apache.org/r1600423 ]

        LUCENE-5731: split out direct packed ints from in-ram ones

        Show
        ASF subversion and git services added a comment - Commit 1600423 from Robert Muir in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1600423 ] LUCENE-5731 : split out direct packed ints from in-ram ones
        Hide
        Uwe Schindler added a comment - - edited

        Thanks Robert. I was very busy today, so I had no time to look into it. But from my first check it looks like our idea from the talk yesterday I was afraid to propose to implement this using an interface, thanks for doing it that way. Otherwise we would have crazyness in ByteBufferIndexInput. The interface hidden behind the randomAccessSlice() method just returning "slice()" is wonderful.

        @Override
        public RandomAccessInput randomAccessSlice(long offset, long length) throws IOException {
          // note: technically we could even avoid the clone...
          return slice(null, offset, length);
        }
        

        We can avoid the clone not in all cases, because we must duplicate the ByteBuffer, if the offset is different. But for the simple case, if you request the full IndexInput as slice (means offset==0L, length==this.length), we could return "this".

        EDIT: we cannot do this at the moment, because in the multi-mmap case, we change the bytebuffers's position. So we always have to clone (otherwise the random access slice would have side effects on file position of master slice).

        Show
        Uwe Schindler added a comment - - edited Thanks Robert. I was very busy today, so I had no time to look into it. But from my first check it looks like our idea from the talk yesterday I was afraid to propose to implement this using an interface, thanks for doing it that way. Otherwise we would have crazyness in ByteBufferIndexInput. The interface hidden behind the randomAccessSlice() method just returning "slice()" is wonderful. @Override public RandomAccessInput randomAccessSlice( long offset, long length) throws IOException { // note: technically we could even avoid the clone... return slice( null , offset, length); } We can avoid the clone not in all cases, because we must duplicate the ByteBuffer, if the offset is different. But for the simple case, if you request the full IndexInput as slice (means offset==0L, length==this.length), we could return "this". EDIT: we cannot do this at the moment, because in the multi-mmap case, we change the bytebuffers's position. So we always have to clone (otherwise the random access slice would have side effects on file position of master slice).

          People

          • Assignee:
            Robert Muir
            Reporter:
            Robert Muir
          • Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development