Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 4.9, 6.0
    • Component/s: core/store
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      For traditional lucene access which is mostly sequential, occasional advance(), I think this method gets drowned out in noise.

      But for access like docvalues, its important. Unfortunately seek() is complex today because of mapping multiple buffers.

      However, the very common case is that only one map is used for a given clone or slice.

      When there is the possibility to use only a single mapped buffer, we should instead take advantage of ByteBuffer.slice(), which will adjust the internal mmap address and remove the offset calculation. furthermore we don't need the shift/mask or even the negative check, as they are then all handled with the ByteBuffer api: seek is a one-liner (with try/catch of course to convert exceptions).

      This makes docvalues access 20% faster, I havent tested conjunctions or anyhting like that.

      1. LUCENE-5722.patch
        21 kB
        Uwe Schindler
      2. LUCENE-5722.patch
        19 kB
        Uwe Schindler
      3. LUCENE-5722.patch
        20 kB
        Uwe Schindler
      4. LUCENE-5722.patch
        19 kB
        Uwe Schindler
      5. LUCENE-5722.patch
        18 kB
        Uwe Schindler
      6. LUCENE-5722.patch
        12 kB
        Robert Muir
      7. LUCENE-5722-multiseek.patch
        2 kB
        Uwe Schindler

        Activity

        Hide
        Robert Muir added a comment -

        Patch: warning, its quite ugly but looks correct and seems to do well with tests (all pass).

        I see a combined 45% improvement to docvalues performance with this patch and LUCENE-5720.

        The hairy part: it comes from the fact that even if we have a big file (e.g. dv .dat today) with multiple buffers, slice() should be optimal in the case only one is needed to access that region. And I abstracted ByteBufferIndexInput and i guess i'm paying the cost now

        On the other hand this opens up additional things to explore, e.g. maybe we should override readByte/Bytes since its much less code to inline in this case, and maybe we should investigate simply changing directpackedreaders to just require a slice over their data (e.g. getFilePointer == 0) to remove the addition there.

        Show
        Robert Muir added a comment - Patch: warning, its quite ugly but looks correct and seems to do well with tests (all pass). I see a combined 45% improvement to docvalues performance with this patch and LUCENE-5720 . The hairy part: it comes from the fact that even if we have a big file (e.g. dv .dat today) with multiple buffers, slice() should be optimal in the case only one is needed to access that region. And I abstracted ByteBufferIndexInput and i guess i'm paying the cost now On the other hand this opens up additional things to explore, e.g. maybe we should override readByte/Bytes since its much less code to inline in this case, and maybe we should investigate simply changing directpackedreaders to just require a slice over their data (e.g. getFilePointer == 0) to remove the addition there.
        Hide
        Michael McCandless added a comment -

        +1 to specialize the single buffer case again.

        Does the optimization only kick in when the entire .dat (holding doc values for all fields in this segment) is less than MMapDir's chunk size? Ie, we don't use IndexInput.slice from our doc values impls today (at least for the default, Lucene45DVF). We just do .clone(), which in ByteBufferIndexInput is a slice of the whole file.

        Also, if you get unlucky, the .dat could be smallish but span 2 chunks anyway, e.g. if that segment used CFS format, and then specialization doesn't kick in, right?

        Show
        Michael McCandless added a comment - +1 to specialize the single buffer case again. Does the optimization only kick in when the entire .dat (holding doc values for all fields in this segment) is less than MMapDir's chunk size? Ie, we don't use IndexInput.slice from our doc values impls today (at least for the default, Lucene45DVF). We just do .clone(), which in ByteBufferIndexInput is a slice of the whole file. Also, if you get unlucky, the .dat could be smallish but span 2 chunks anyway, e.g. if that segment used CFS format, and then specialization doesn't kick in, right?
        Hide
        Uwe Schindler added a comment -

        Also, if you get unlucky, the .dat could be smallish but span 2 chunks anyway, e.g. if that segment used CFS format, and then specialization doesn't kick in, right?

        Of course, but on 64 bit operating system the chunk size is 1 GiB, so this will happen more seldom. It is more likely that you have, as discusses, a very big file and its completely mapped. I think we should do the improvement, but not do too much to prevent this multi-case (otheriwse code gets more complicated an error-prone).

        In most cases, for a single segment and one single field the improvement will kick in often enough. This is one reason why Robert said, we should maybe investigate to use slice() for accessing the docvalues of a specific field, instead of a full clone.

        There might be another way to improve this to be a singleton, but it is too hairy: We could do a fresh mapping on the slice. But this would need to also unmap this fresh slice. And in addition, it consumes additional address space. One thing that could be done here: If we know in advance, that we never need the full file, we could mmap only a slice. Maybe we should offer Directory#openInput(filename, offset, length) which could directly optimize for the single buffer case??? [don't kill me about this suggestion, was just an idea].

        About the patch: I don't like "singleton" as term, because its closely related to the pattern "singleton class instance". I would rename the single buffer one to "SingleByteBufferIndexInput". The method singleton() is fine, I guess, just the class name.

        There is one thing, we might want to add an assert: In the single buffer case, there is the slight chance to not catch an exception, if the cast from the seek offset to int luckily gets into the valid slice area. Maybe we should not add a hard check, but for our own safety while writing the code, we should maybe check that the long offset is <= Integer.MAX_VALUE.

        I like the idea of using ByteBuffer.slice(). Unfortunately (I am so unhappy!), we cannot use this for the multi-buffer approach (because this would require then more calculations on clone, which are now optimized to be bitshifts and & only).

        Also Eclipse warns if you call a static method from a subclass (if properly configured). ByteBufferIndexInput.newClonesMap() should not be accessed as MMapIndexInput.newCloneMap()... But thats just cosmetic - although it confused me, too (I hate Java for allowing to access static methods with a different class name, we should maybe make the warning a failure in Eclipse compiler config at least).

        In any case, we might improve the multi-buffer seek, too: if we already know before that we will land in the same buffer - we could maybe do this in a small check at the begining of the seek method: If we hit the same buffer, just do curBuffer.position() and spare out the whole other stuff (which does many assignments and additional checks). I will think about this a bit more...

        Show
        Uwe Schindler added a comment - Also, if you get unlucky, the .dat could be smallish but span 2 chunks anyway, e.g. if that segment used CFS format, and then specialization doesn't kick in, right? Of course, but on 64 bit operating system the chunk size is 1 GiB, so this will happen more seldom. It is more likely that you have, as discusses, a very big file and its completely mapped. I think we should do the improvement, but not do too much to prevent this multi-case (otheriwse code gets more complicated an error-prone). In most cases, for a single segment and one single field the improvement will kick in often enough. This is one reason why Robert said, we should maybe investigate to use slice() for accessing the docvalues of a specific field, instead of a full clone. There might be another way to improve this to be a singleton, but it is too hairy: We could do a fresh mapping on the slice. But this would need to also unmap this fresh slice. And in addition, it consumes additional address space. One thing that could be done here: If we know in advance, that we never need the full file, we could mmap only a slice. Maybe we should offer Directory#openInput(filename, offset, length) which could directly optimize for the single buffer case??? [don't kill me about this suggestion, was just an idea] . About the patch: I don't like "singleton" as term, because its closely related to the pattern "singleton class instance". I would rename the single buffer one to "SingleByteBufferIndexInput". The method singleton() is fine, I guess, just the class name. There is one thing, we might want to add an assert: In the single buffer case, there is the slight chance to not catch an exception, if the cast from the seek offset to int luckily gets into the valid slice area. Maybe we should not add a hard check, but for our own safety while writing the code, we should maybe check that the long offset is <= Integer.MAX_VALUE. I like the idea of using ByteBuffer.slice(). Unfortunately (I am so unhappy!), we cannot use this for the multi-buffer approach (because this would require then more calculations on clone, which are now optimized to be bitshifts and & only). Also Eclipse warns if you call a static method from a subclass (if properly configured). ByteBufferIndexInput.newClonesMap() should not be accessed as MMapIndexInput.newCloneMap()... But thats just cosmetic - although it confused me, too (I hate Java for allowing to access static methods with a different class name, we should maybe make the warning a failure in Eclipse compiler config at least). In any case, we might improve the multi-buffer seek, too: if we already know before that we will land in the same buffer - we could maybe do this in a small check at the begining of the seek method: If we hit the same buffer, just do curBuffer.position() and spare out the whole other stuff (which does many assignments and additional checks). I will think about this a bit more...
        Hide
        Robert Muir added a comment -

        Sorry Uwe, we shouldn't add extra conditionals to seek here, its the entire point of the issue!!!!!!!!

        Show
        Robert Muir added a comment - Sorry Uwe, we shouldn't add extra conditionals to seek here, its the entire point of the issue!!!!!!!!
        Hide
        Uwe Schindler added a comment - - edited

        Sorry Uwe, we shouldn't add extra conditionals to seek here, its the entire point of the issue!!!!!!!!

        Can you please quote, what you are referring to? Thanks.

        Do you mean that one:

        There is one thing, we might want to add an assert: In the single buffer case, there is the slight chance to not catch an exception, if the cast from the seek offset to int luckily gets into the valid slice area. Maybe we should not add a hard check, but for our own safety while writing the code, we should maybe check that the long offset is <= Integer.MAX_VALUE.

        (this is just an assert to catch bugs while testing, I dont want to add a hard check.)

        or that one:

        In any case, we might improve the multi-buffer seek, too: if we already know before that we will land in the same buffer - we could maybe do this in a small check at the begining of the seek method: If we hit the same buffer, just do curBuffer.position() and spare out the whole other stuff (which does many assignments and additional checks). I will think about this a bit more...

        Don't get me wrong, I was just adding ideas. I know what this issue is about, but sometimes one additional check that triggers the "common case" is better than having crazy code executed on every call. My idea is basically, we can throw it away:

        Under the assumption that we seek in most cases not accross byte buffers, the overhead of seek could be completely optimized away with one additional check: we may add another "state" variable with the (long) start offset of the current buffer (curBufOffset - unfortunately, this is one more state field that we need to maintain). On seek we maybe check that seek's position parameter minus current buffers start offset is positive and less than Integer.MAX_VALUE away - this is one check: (difference&Integer.MAX_VALUE)==difference. If this is the case, we can directly optimize to only call position() on the current buffer. If any Exception happens we fall-through to our dumb seek (because it is not always the EOF case).

        Instead of throwing that away because you don't like it, I wanted to do a benchmark. Don't just say: "this is bullshit" I try hard to think about an optimization of the cross-buffer seek, too. There must be a way to do this! Maybe you have another idea instead of mine, but please don't tell me that I don't know what I am talking about! In the ideal case I would like to do no checks at all and let ByteBuffer throw exceptions if we seek outside the current buffer. But this is, as far as I know, impossible because of the cast from long to int (it would be nice to have something like the x86 carry-flag (http://en.wikipedia.org/wiki/Carry_flag) in Java, too, so you could detect overflow while casting long->int)..

        We can also maybe move the negative check on the beginning of seek down or do it more intelligent?

        Show
        Uwe Schindler added a comment - - edited Sorry Uwe, we shouldn't add extra conditionals to seek here, its the entire point of the issue!!!!!!!! Can you please quote, what you are referring to? Thanks. Do you mean that one: There is one thing, we might want to add an assert: In the single buffer case, there is the slight chance to not catch an exception, if the cast from the seek offset to int luckily gets into the valid slice area. Maybe we should not add a hard check, but for our own safety while writing the code, we should maybe check that the long offset is <= Integer.MAX_VALUE. (this is just an assert to catch bugs while testing, I dont want to add a hard check.) or that one: In any case, we might improve the multi-buffer seek, too: if we already know before that we will land in the same buffer - we could maybe do this in a small check at the begining of the seek method: If we hit the same buffer, just do curBuffer.position() and spare out the whole other stuff (which does many assignments and additional checks). I will think about this a bit more... Don't get me wrong, I was just adding ideas. I know what this issue is about, but sometimes one additional check that triggers the "common case" is better than having crazy code executed on every call. My idea is basically, we can throw it away: Under the assumption that we seek in most cases not accross byte buffers, the overhead of seek could be completely optimized away with one additional check: we may add another "state" variable with the (long) start offset of the current buffer (curBufOffset - unfortunately, this is one more state field that we need to maintain). On seek we maybe check that seek's position parameter minus current buffers start offset is positive and less than Integer.MAX_VALUE away - this is one check: (difference&Integer.MAX_VALUE)==difference . If this is the case, we can directly optimize to only call position() on the current buffer. If any Exception happens we fall-through to our dumb seek (because it is not always the EOF case). Instead of throwing that away because you don't like it, I wanted to do a benchmark. Don't just say: "this is bullshit" I try hard to think about an optimization of the cross-buffer seek, too. There must be a way to do this! Maybe you have another idea instead of mine, but please don't tell me that I don't know what I am talking about! In the ideal case I would like to do no checks at all and let ByteBuffer throw exceptions if we seek outside the current buffer. But this is, as far as I know, impossible because of the cast from long to int (it would be nice to have something like the x86 carry-flag ( http://en.wikipedia.org/wiki/Carry_flag ) in Java, too, so you could detect overflow while casting long->int).. We can also maybe move the negative check on the beginning of seek down or do it more intelligent?
        Hide
        Robert Muir added a comment -

        I was just responding to the first comment! I do not really want an assert, as this method Shoukd be considered hot (its the point of the issue) and assert is not free.

        Show
        Robert Muir added a comment - I was just responding to the first comment! I do not really want an assert, as this method Shoukd be considered hot (its the point of the issue) and assert is not free.
        Hide
        Robert Muir added a comment -

        As far as the second idea, of course that's something nice to look into. But I don't think it will ever be simpler than the one buffer case.

        Show
        Robert Muir added a comment - As far as the second idea, of course that's something nice to look into. But I don't think it will ever be simpler than the one buffer case.
        Hide
        Uwe Schindler added a comment -

        I was just responding to the first comment! I do not really want an assert, as this method Shoukd be considered hot (its the point of the issue) and assert is not free.

        OK I just wanted to point out that we have no 100% safety if you seek out of slice for the single buffer case.

        I was just afraid that you were complaining about collecting ideas for optimizing the multi-buffer seek (see above). I am not sure if my idea helps or is risky at all, we should just try it out.

        Show
        Uwe Schindler added a comment - I was just responding to the first comment! I do not really want an assert, as this method Shoukd be considered hot (its the point of the issue) and assert is not free. OK I just wanted to point out that we have no 100% safety if you seek out of slice for the single buffer case. I was just afraid that you were complaining about collecting ideas for optimizing the multi-buffer seek (see above). I am not sure if my idea helps or is risky at all, we should just try it out.
        Hide
        Uwe Schindler added a comment -

        But I don't think it will ever be simpler than the one buffer case.

        Of course not, but could still help. Of course to simplify benchmarking we should look at both improvements in separate (like temporarily disable the single buffer case while benchmarking).

        Show
        Uwe Schindler added a comment - But I don't think it will ever be simpler than the one buffer case. Of course not, but could still help. Of course to simplify benchmarking we should look at both improvements in separate (like temporarily disable the single buffer case while benchmarking).
        Hide
        Uwe Schindler added a comment -

        I looked at the code very long time, also at Roberts patch.

        I found out: the subclassing issue can be solved quite easily: We dont need to make ByteBufferIndexInput abstract, the solution would be to pass some "unmapper" instance to the constructor that does the unmapping, so freeBuffers does not need to be abstract. In that case we can use ByteBufferIndexInput as concrete class.

        The second thing that is an issue in MultiMmap-Seek is the problem with the offset. The offset is in ByteBufferIndexInput only used in seek and when creating slices/clones. The idea is now, to completely remove the offset from the base class. The base class is useable for the case when offset=0 and multiple buffers are used. The whole chekcs at the beginning of seek() are then useless, because they only apply for the case offset=0. In all other cases we already catch the out-of-bounds cases by AIOOBE and similar.

        The special cases would then be:

        • SingleByteBufferIndexInput extends ByteBufferIndexInput: we can remove the assert, because offset no longer exists in this base class. We always use ByteBuffer.slice here.
        • The other special case is offset!=0 for multi-mmap: In that case we have a second concreate subclass, that just overrides seek() to do the offset checks at the beginning and if all is adjusted call super.seek().

        The cloning/slicing can be done much easier and we just include the offset here.

        Furthermore, I made a small improvement to the ByteBufferIndexInput.seek() for the case if seeking happens inside the same buffer. With the optimizations above the whole thing is then mostly a simple position() call on the byte buffer with a few calculations.

        I will resort all this stuff an provide a patch!

        Show
        Uwe Schindler added a comment - I looked at the code very long time, also at Roberts patch. I found out: the subclassing issue can be solved quite easily: We dont need to make ByteBufferIndexInput abstract, the solution would be to pass some "unmapper" instance to the constructor that does the unmapping, so freeBuffers does not need to be abstract. In that case we can use ByteBufferIndexInput as concrete class. The second thing that is an issue in MultiMmap-Seek is the problem with the offset. The offset is in ByteBufferIndexInput only used in seek and when creating slices/clones. The idea is now, to completely remove the offset from the base class. The base class is useable for the case when offset=0 and multiple buffers are used. The whole chekcs at the beginning of seek() are then useless, because they only apply for the case offset=0. In all other cases we already catch the out-of-bounds cases by AIOOBE and similar. The special cases would then be: SingleByteBufferIndexInput extends ByteBufferIndexInput: we can remove the assert, because offset no longer exists in this base class. We always use ByteBuffer.slice here. The other special case is offset!=0 for multi-mmap: In that case we have a second concreate subclass, that just overrides seek() to do the offset checks at the beginning and if all is adjusted call super.seek(). The cloning/slicing can be done much easier and we just include the offset here. Furthermore, I made a small improvement to the ByteBufferIndexInput.seek() for the case if seeking happens inside the same buffer. With the optimizations above the whole thing is then mostly a simple position() call on the byte buffer with a few calculations. I will resort all this stuff an provide a patch!
        Hide
        Uwe Schindler added a comment -

        In addition with my new approach stuff like offset and length will be final variables. On cloing we never call super.clone(), we just create a new instance. By this the code is easier to understand and cloning can get further improvements!

        Show
        Uwe Schindler added a comment - In addition with my new approach stuff like offset and length will be final variables. On cloing we never call super.clone(), we just create a new instance. By this the code is easier to understand and cloning can get further improvements!
        Hide
        Uwe Schindler added a comment -

        Here is my patch. I also added a new test for slices of slices, so we can be sure that offsets are correctly passed through (this was the hardest part). The workaround is the protected method added t the base class, which is implemented only for the offsets-aware impl.

        While debugging the stuff and also looking around for the TODO Robert added, I found out what is wrong: The issue is bug LUCENE-5658, which is now understood.

        The main problem was that when calculating the limit of the last buffer in the slice, this blows up for the special case when the slice ends exactly at a chunkSize boundary. So this commit would also fix LUCENE-5658.

        Nevertheless, I will commit the fix for LUCENE-5658 separately.

        Show
        Uwe Schindler added a comment - Here is my patch. I also added a new test for slices of slices, so we can be sure that offsets are correctly passed through (this was the hardest part). The workaround is the protected method added t the base class, which is implemented only for the offsets-aware impl. While debugging the stuff and also looking around for the TODO Robert added, I found out what is wrong: The issue is bug LUCENE-5658 , which is now understood. The main problem was that when calculating the limit of the last buffer in the slice, this blows up for the special case when the slice ends exactly at a chunkSize boundary. So this commit would also fix LUCENE-5658 . Nevertheless, I will commit the fix for LUCENE-5658 separately.
        Hide
        Uwe Schindler added a comment -

        Better fix for the limit() problem.

        Show
        Uwe Schindler added a comment - Better fix for the limit() problem.
        Hide
        Uwe Schindler added a comment -

        This is a better pathc for creating the slices (split off buildSlice from slice() again). This buildSlice method is overridden in the Offset-aware specialization.

        This is much better than the stupid getOffset() protected method I had before. Also it no longer does useless checks when cloning.

        Show
        Uwe Schindler added a comment - This is a better pathc for creating the slices (split off buildSlice from slice() again). This buildSlice method is overridden in the Offset-aware specialization. This is much better than the stupid getOffset() protected method I had before. Also it no longer does useless checks when cloning.
        Hide
        Uwe Schindler added a comment -

        After digging more, I found out that the bug in LUCENE-5658 is not related to that one. The problem was a bug in the offset calculation of the single buffer special case. The test created a slice from a two-buffer input that started at the buffer boundary. The result was a slice with one buffer, so the optimization applied. But Robert's patch was missing to apply the chunkSizeMask, so the offset was still the one from the two buffer case.

        The applied patch also contains the test for LUCENE-5658, which passes of course. I also added a test for the special-case Robert has seen.

        This patch still uses the extra 0-byte buffer at the end. We may improve this to handle this and only use a single-buffer indexinput, but the added complexity in buildSlice is not worth to do it.

        I think we can test performance now.

        In addition, there may be another improvement for the default impl's seek. But we should check this separately. I will upload a separate patch tomorrow.

        Show
        Uwe Schindler added a comment - After digging more, I found out that the bug in LUCENE-5658 is not related to that one. The problem was a bug in the offset calculation of the single buffer special case. The test created a slice from a two-buffer input that started at the buffer boundary. The result was a slice with one buffer, so the optimization applied. But Robert's patch was missing to apply the chunkSizeMask, so the offset was still the one from the two buffer case. The applied patch also contains the test for LUCENE-5658 , which passes of course. I also added a test for the special-case Robert has seen. This patch still uses the extra 0-byte buffer at the end. We may improve this to handle this and only use a single-buffer indexinput, but the added complexity in buildSlice is not worth to do it. I think we can test performance now. In addition, there may be another improvement for the default impl's seek. But we should check this separately. I will upload a separate patch tomorrow.
        Hide
        Uwe Schindler added a comment -

        For reference, this is the optimization I had in mind. I don't know if it helps for the multi-buffer case, but may be worth a try.

        The patch may not apply cleanly, its just for demonstartion purposes.

        Show
        Uwe Schindler added a comment - For reference, this is the optimization I had in mind. I don't know if it helps for the multi-buffer case, but may be worth a try. The patch may not apply cleanly, its just for demonstartion purposes.
        Hide
        Robert Muir added a comment -

        For reference, this is the optimization I had in mind. I don't know if it helps for the multi-buffer case, but may be worth a try.

        The patch may not apply cleanly, its just for demonstartion purposes.

        I tested this with sorting on 1M and 10M wikipedia index: its a consistent 7% improvement. +1 to just commit that one, and lets keep iterating on the more complex refactor!

        Show
        Robert Muir added a comment - For reference, this is the optimization I had in mind. I don't know if it helps for the multi-buffer case, but may be worth a try. The patch may not apply cleanly, its just for demonstartion purposes. I tested this with sorting on 1M and 10M wikipedia index: its a consistent 7% improvement. +1 to just commit that one, and lets keep iterating on the more complex refactor!
        Hide
        Uwe Schindler added a comment -

        OK, I will commit that to 4.x and trunk. Then I will upload a new patch for the big refactoring.

        Show
        Uwe Schindler added a comment - OK, I will commit that to 4.x and trunk. Then I will upload a new patch for the big refactoring.
        Hide
        ASF subversion and git services added a comment -

        Commit 1599218 from Uwe Schindler in branch 'dev/trunk'
        [ https://svn.apache.org/r1599218 ]

        LUCENE-5722: Speed up MMapDirectory.seek() - first small patch for multi-mmap case

        Show
        ASF subversion and git services added a comment - Commit 1599218 from Uwe Schindler in branch 'dev/trunk' [ https://svn.apache.org/r1599218 ] LUCENE-5722 : Speed up MMapDirectory.seek() - first small patch for multi-mmap case
        Hide
        ASF subversion and git services added a comment -

        Commit 1599219 from Uwe Schindler in branch 'dev/branches/branch_4x'
        [ https://svn.apache.org/r1599219 ]

        Merged revision(s) 1599218 from lucene/dev/trunk:
        LUCENE-5722: Speed up MMapDirectory.seek() - first small patch for multi-mmap case

        Show
        ASF subversion and git services added a comment - Commit 1599219 from Uwe Schindler in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1599219 ] Merged revision(s) 1599218 from lucene/dev/trunk: LUCENE-5722 : Speed up MMapDirectory.seek() - first small patch for multi-mmap case
        Hide
        Uwe Schindler added a comment -

        New patch for current trunk:

        • Removed the useless tests which were there for debugging only
        • Factored out the factory for creating clone instances. For perf tests there are now 2 places to modify:
        1. static ByteBufferIndexInput newInstance() called from MMapDirectory to create the instance used to return as IndexInput
        2. protected ByteBufferIndexInput newCloneInstance() called when clones/slices are requested. This one also takes offset into account.

        For benchmarking we might comment out some parts of those 2 methods.

        Show
        Uwe Schindler added a comment - New patch for current trunk: Removed the useless tests which were there for debugging only Factored out the factory for creating clone instances. For perf tests there are now 2 places to modify: static ByteBufferIndexInput newInstance() called from MMapDirectory to create the instance used to return as IndexInput protected ByteBufferIndexInput newCloneInstance() called when clones/slices are requested. This one also takes offset into account. For benchmarking we might comment out some parts of those 2 methods.
        Hide
        Uwe Schindler added a comment -

        New patch, last one had a bug (merge problem).

        Show
        Uwe Schindler added a comment - New patch, last one had a bug (merge problem).
        Hide
        Robert Muir added a comment -

        +1 to current patch, this is just more speed on top of previous improvements today.

        With my sort test, its an additional 7-10% (on top of previous commit which was similar). With a microbenchmark of numericdocvalues the improvement is way more substantial (it seems ~ 25%)

        In order to continue further, after this one is committed I want to exploit this slice API for packed ints, instead of clone()'ing the whole file in DV we just slice() what we need, remove offset adjustments in the packed ints decoder, and actually get more safety (read past EOF if you screw up instead of reading into another fields packed ints or whatever).

        In parallel I will begin work on backporting slice() api to 4.x, its baked for a while and I think is good to go. Ill start on this now.

        Show
        Robert Muir added a comment - +1 to current patch, this is just more speed on top of previous improvements today. With my sort test, its an additional 7-10% (on top of previous commit which was similar). With a microbenchmark of numericdocvalues the improvement is way more substantial (it seems ~ 25%) In order to continue further, after this one is committed I want to exploit this slice API for packed ints, instead of clone()'ing the whole file in DV we just slice() what we need, remove offset adjustments in the packed ints decoder, and actually get more safety (read past EOF if you screw up instead of reading into another fields packed ints or whatever). In parallel I will begin work on backporting slice() api to 4.x, its baked for a while and I think is good to go. Ill start on this now.
        Hide
        Uwe Schindler added a comment -

        I will just add some more test to TestMultiMMap so it checks that the correct instances are returned (using instanceof). I just want to be sure, the 2 factory methods are creating the right impl classes for every combination of slices and master indexinputs.

        Show
        Uwe Schindler added a comment - I will just add some more test to TestMultiMMap so it checks that the correct instances are returned (using instanceof). I just want to be sure, the 2 factory methods are creating the right impl classes for every combination of slices and master indexinputs.
        Hide
        Uwe Schindler added a comment -

        New patch that adds a test which checks the implementations returned after getting IndexInput and cloning/slicing. It asserts on random slices, their size and the chunkSize used.

        I will commit this later!

        Show
        Uwe Schindler added a comment - New patch that adds a test which checks the implementations returned after getting IndexInput and cloning/slicing. It asserts on random slices, their size and the chunkSize used. I will commit this later!
        Hide
        ASF subversion and git services added a comment -

        Commit 1599276 from Uwe Schindler in branch 'dev/trunk'
        [ https://svn.apache.org/r1599276 ]

        LUCENE-5722: Optimize ByteBufferIndexInput#seek() by specializing implementations. This improves random access as used by docvalues codecs if used with MMapDirectory.

        Show
        ASF subversion and git services added a comment - Commit 1599276 from Uwe Schindler in branch 'dev/trunk' [ https://svn.apache.org/r1599276 ] LUCENE-5722 : Optimize ByteBufferIndexInput#seek() by specializing implementations. This improves random access as used by docvalues codecs if used with MMapDirectory.
        Hide
        ASF subversion and git services added a comment -

        Commit 1599278 from Uwe Schindler in branch 'dev/branches/branch_4x'
        [ https://svn.apache.org/r1599278 ]

        Merged revision(s) 1599276 from lucene/dev/trunk:
        LUCENE-5722: Optimize ByteBufferIndexInput#seek() by specializing implementations. This improves random access as used by docvalues codecs if used with MMapDirectory.

        Show
        ASF subversion and git services added a comment - Commit 1599278 from Uwe Schindler in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1599278 ] Merged revision(s) 1599276 from lucene/dev/trunk: LUCENE-5722 : Optimize ByteBufferIndexInput#seek() by specializing implementations. This improves random access as used by docvalues codecs if used with MMapDirectory.
        Hide
        Uwe Schindler added a comment -

        Thanks Robert for the quick backport of slicer removal! Also thanks for benchmarking - great work together.

        We can open another issue to maybe specialize the single buffer indexinput more (override readByte() / readBytes()).

        Show
        Uwe Schindler added a comment - Thanks Robert for the quick backport of slicer removal! Also thanks for benchmarking - great work together. We can open another issue to maybe specialize the single buffer indexinput more (override readByte() / readBytes()).

          People

          • Assignee:
            Uwe Schindler
            Reporter:
            Robert Muir
          • Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development