Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 3.1, 4.0-ALPHA
    • Fix Version/s: 3.1, 4.0-ALPHA
    • Component/s: None
    • Labels:
      None
    • Lucene Fields:
      New, Patch Available

      Description

      along the same lines as LUCENE-2816:

      • the readVInt/readVLong/readShort/readInt/readLong are not optimal here since they defer to readByte. for example this means checking the buffer's bounds per-byte in readVint instead of per-vint.
      • its an easy win to speed this up, even for the vint case: its essentially always faster, the only slower case is 1024 single-byte vints in a row, in this case we would do a single extra bounds check (1025 instead of 1024)

        Activity

        Hide
        Robert Muir added a comment -

        here's the patch. i reverted my previous backwards break for the no-payloads optimization since this is actually faster overall.

        Here's standard codec (trunk). I've run this benchmark about 5 times to be sure.

        Query QPS trunk QPS patch Pct diff
        uni* 11.41 11.36 -0.5%
        unit* 20.57 20.47 -0.5%
        un*d 31.56 31.63 0.2%
        united~2.0 1.66 1.67 0.5%
        unit~1.0 5.21 5.23 0.5%
        unit~2.0 5.09 5.14 0.9%
        united~1.0 7.61 7.69 1.0%
        unit state 7.11 7.21 1.3%
        spanNear([unit, state], 10, true) 2.67 2.72 2.0%
        state 25.09 25.61 2.1%
        +nebraska +state 69.29 70.84 2.2%
        u*d 8.44 8.71 3.2%
        "unit state" 4.98 5.17 3.8%
        +unit +state 7.34 7.70 4.8%
        spanFirst(unit, 5) 10.27 11.35 10.5%

        the optimization is more important though for the methods like readInt (15% faster in my tests)

        Show
        Robert Muir added a comment - here's the patch. i reverted my previous backwards break for the no-payloads optimization since this is actually faster overall. Here's standard codec (trunk). I've run this benchmark about 5 times to be sure. Query QPS trunk QPS patch Pct diff uni* 11.41 11.36 -0.5% unit* 20.57 20.47 -0.5% un*d 31.56 31.63 0.2% united~2.0 1.66 1.67 0.5% unit~1.0 5.21 5.23 0.5% unit~2.0 5.09 5.14 0.9% united~1.0 7.61 7.69 1.0% unit state 7.11 7.21 1.3% spanNear( [unit, state] , 10, true) 2.67 2.72 2.0% state 25.09 25.61 2.1% +nebraska +state 69.29 70.84 2.2% u*d 8.44 8.71 3.2% "unit state" 4.98 5.17 3.8% +unit +state 7.34 7.70 4.8% spanFirst(unit, 5) 10.27 11.35 10.5% the optimization is more important though for the methods like readInt (15% faster in my tests)
        Hide
        Uwe Schindler added a comment -

        +1, Thats a good idea overall: Always optimize the general case (buffer is large enough), and fallback if not

        Show
        Uwe Schindler added a comment - +1, Thats a good idea overall: Always optimize the general case (buffer is large enough), and fallback if not
        Hide
        Robert Muir added a comment -

        sorry my math was off, the worst case is 4 extra checks (1028 total?)... but in general the buffer size default (1024)
        is large enough that this helps.

        Show
        Robert Muir added a comment - sorry my math was off, the worst case is 4 extra checks (1028 total?)... but in general the buffer size default (1024) is large enough that this helps.
        Hide
        Paul Elschot added a comment -

        Did you also try to catch out of bounds exceptions instead of doing the bounds checks in the current patch?

        Show
        Paul Elschot added a comment - Did you also try to catch out of bounds exceptions instead of doing the bounds checks in the current patch?
        Hide
        Robert Muir added a comment -

        Did you also try to catch out of bounds exceptions instead of doing the bounds checks in the current patch?

        Paul, how can we do this? In the mmap case we can because the mmap getXXX will throw the bufferunderflow
        and not actually read anything if there isn't enough bytes.

        But in this case I don't see how we can re-arrange the code to safely do this?
        if you know, please let us know as I think this would be better too.

        Show
        Robert Muir added a comment - Did you also try to catch out of bounds exceptions instead of doing the bounds checks in the current patch? Paul, how can we do this? In the mmap case we can because the mmap getXXX will throw the bufferunderflow and not actually read anything if there isn't enough bytes. But in this case I don't see how we can re-arrange the code to safely do this? if you know, please let us know as I think this would be better too.
        Hide
        Uwe Schindler added a comment -

        For the MMap case with veeeery large arrays, the cost of exception instantiation with stack trace is really small. I think for small buffer sizes like 1024 the overhead would be immense. But we should check this

        Another place where the checks were omitted and instead AIOOBE is catched is FieldCacheRangeFilter. This really helps there! But in that case FieldCache is always large

        Show
        Uwe Schindler added a comment - For the MMap case with veeeery large arrays, the cost of exception instantiation with stack trace is really small. I think for small buffer sizes like 1024 the overhead would be immense. But we should check this Another place where the checks were omitted and instead AIOOBE is catched is FieldCacheRangeFilter. This really helps there! But in that case FieldCache is always large
        Hide
        Robert Muir added a comment -

        oh, i think i see your idea Paul... so in the special case when we refill() at the end of the file and we populate
        less than the buffers length, we just have to copy to a shorter array so this will work?

        Show
        Robert Muir added a comment - oh, i think i see your idea Paul... so in the special case when we refill() at the end of the file and we populate less than the buffers length, we just have to copy to a shorter array so this will work?
        Hide
        Paul Elschot added a comment -

        Using bounds check would be possible when the array size equals the buffered size. But indeed this need not normally be the case.
        I'll take a closer look, it might be worthwhile to use a smaller array object when there is no more data available.

        Show
        Paul Elschot added a comment - Using bounds check would be possible when the array size equals the buffered size. But indeed this need not normally be the case. I'll take a closer look, it might be worthwhile to use a smaller array object when there is no more data available.
        Hide
        Paul Elschot added a comment -

        You are obviously more familiar with the code than me

        Show
        Paul Elschot added a comment - You are obviously more familiar with the code than me
        Hide
        Robert Muir added a comment -

        well its slightly trickier to do it this way (e.g. slimming down the array in the very special case but restoring the fully sized one later in a future refill(), but potentially worthwhile... we should at least benchmark the exception idea, and see how it goes.

        Show
        Robert Muir added a comment - well its slightly trickier to do it this way (e.g. slimming down the array in the very special case but restoring the fully sized one later in a future refill(), but potentially worthwhile... we should at least benchmark the exception idea, and see how it goes.
        Hide
        Uwe Schindler added a comment - - edited

        And as said in my above comment, for buffer sizes like 1024 or in that area (even for 16384 or like that), the overhead of throwing the AIOOBE would be much higher (as the JVM needs to generate stack trace!!!). I would simply don't even think about that g.

        For MMap the idea is fine because we normally have "buffer" sizes of 2^31th, where the AIOOBE / BufferUnderflowEx is veeeeeeeeeeeery seldom.

        Show
        Uwe Schindler added a comment - - edited And as said in my above comment, for buffer sizes like 1024 or in that area (even for 16384 or like that), the overhead of throwing the AIOOBE would be much higher (as the JVM needs to generate stack trace!!!). I would simply don't even think about that g . For MMap the idea is fine because we normally have "buffer" sizes of 2^31th, where the AIOOBE / BufferUnderflowEx is veeeeeeeeeeeery seldom.
        Hide
        Paul Elschot added a comment -

        It's too long ago that I had to deal with this tradeoff myself, so I'm taking a look here: http://www.javaspecialists.eu/archive/Issue187.html
        "Cost of causing exceptions" (2010-08-31).

        Show
        Paul Elschot added a comment - It's too long ago that I had to deal with this tradeoff myself, so I'm taking a look here: http://www.javaspecialists.eu/archive/Issue187.html "Cost of causing exceptions" (2010-08-31).
        Hide
        Robert Muir added a comment -

        And as said in my above comment, for buffer sizes like 1024 or in that area (even for 16384 or like that), the overhead of throwing the AIOOBE would be much higher (as the JVM needs to generate stack trace!!!). I would simply don't even think about that g.

        Ok, i looked into this and I think I agree with Uwe.

        I'm concerned about the JRE-specifics here too (cost of an exception, for example I ran all the tests on IKVM jvm the other day,
        and a wierd one failed due to this:
        http://weblog.ikvm.net/CommentView.aspx?guid=062e4506-89c4-488e-9104-59c1ec80007b

        So, I think the optimization here (reducing checks on average) is safe, but I'm worried about going the exception route with
        such a small buffer... even if we could tweak it out to perform better on a sun JRE,

        Show
        Robert Muir added a comment - And as said in my above comment, for buffer sizes like 1024 or in that area (even for 16384 or like that), the overhead of throwing the AIOOBE would be much higher (as the JVM needs to generate stack trace!!!). I would simply don't even think about that g. Ok, i looked into this and I think I agree with Uwe. I'm concerned about the JRE-specifics here too (cost of an exception, for example I ran all the tests on IKVM jvm the other day, and a wierd one failed due to this: http://weblog.ikvm.net/CommentView.aspx?guid=062e4506-89c4-488e-9104-59c1ec80007b So, I think the optimization here (reducing checks on average) is safe, but I'm worried about going the exception route with such a small buffer... even if we could tweak it out to perform better on a sun JRE,
        Hide
        Paul Elschot added a comment -

        In case I understood the javaspecialists article correctly, it could be faster to use AIOOBE but only when -XX:-OmitStackTraceInFastThrow is not used in the sun/oracle jvm. For the first few hundreds of exceptions from the same piece of code however it would always be slower.

        Since this depends on the JVM I'd rather keep the explicit bounds check in the code for now, and may be open a separate issue when it turns out to be faster to use AIOOBE.

        Show
        Paul Elschot added a comment - In case I understood the javaspecialists article correctly, it could be faster to use AIOOBE but only when -XX:-OmitStackTraceInFastThrow is not used in the sun/oracle jvm. For the first few hundreds of exceptions from the same piece of code however it would always be slower. Since this depends on the JVM I'd rather keep the explicit bounds check in the code for now, and may be open a separate issue when it turns out to be faster to use AIOOBE.
        Hide
        Robert Muir added a comment -

        Paul I agree with your sentiments, for this issue I'd like to stick with this patch as just a small step (reducing bounds checks on average).

        In general, I think its probably the case we can improve our i/o (especially BufferedIndexInput) to be more efficient with
        the int block codecs, don't hesitate to open more issues if there are ideas, I've definitely been testing a lot of crazy things, but
        the others didn't pan out

        Show
        Robert Muir added a comment - Paul I agree with your sentiments, for this issue I'd like to stick with this patch as just a small step (reducing bounds checks on average). In general, I think its probably the case we can improve our i/o (especially BufferedIndexInput) to be more efficient with the int block codecs, don't hesitate to open more issues if there are ideas, I've definitely been testing a lot of crazy things, but the others didn't pan out
        Hide
        Paul Elschot added a comment -

        Do these performance comparison tests (posted on 19 December above) have a basic verification that each query returns the same result, for example a CRC on the matching docids and perhaps also a CRC on the score values?

        Show
        Paul Elschot added a comment - Do these performance comparison tests (posted on 19 December above) have a basic verification that each query returns the same result, for example a CRC on the matching docids and perhaps also a CRC on the score values?
        Hide
        Robert Muir added a comment -

        Do these performance comparison tests (posted on 19 December above) have a basic verification that each query returns the same result, for example a CRC on the matching docids and perhaps also a CRC on the score values?

        Yes, the performance benchmark I used originally came from Mike, it builds on contrib/benchmark, you can see it here: http://code.google.com/p/luceneutil/

        It verifies as you suggested, and in fact has found bugs before that our tests don't find... which frustrates me about our unit tests!

        Show
        Robert Muir added a comment - Do these performance comparison tests (posted on 19 December above) have a basic verification that each query returns the same result, for example a CRC on the matching docids and perhaps also a CRC on the score values? Yes, the performance benchmark I used originally came from Mike, it builds on contrib/benchmark, you can see it here: http://code.google.com/p/luceneutil/ It verifies as you suggested, and in fact has found bugs before that our tests don't find... which frustrates me about our unit tests!
        Hide
        Michael McCandless added a comment -

        I'm seeing excellent gains w/ this patch, on Linux 64bit Java 6 NIOFSDir:

        Query QPS clean QPS robspec Pct diff
        spanFirst(unit, 5) 16.67 15.62 -6.3%
        "unit state" 8.04 7.87 -2.2%
        spanNear([unit, state], 10, true) 4.31 4.25 -1.2%
        "unit state"~3 4.85 5.02 3.6%
        unit state 10.35 10.94 5.7%
        unit~1.0 9.60 10.15 5.7%
        unit~2.0 9.35 9.94 6.3%
        united~2.0 3.30 3.51 6.4%
        +nebraska +state 161.71 174.23 7.7%
        +unit +state 11.20 12.09 8.0%
        doctitle:.[Uu]nited. 3.93 4.25 8.0%
        united~1.0 15.12 16.39 8.4%
        un*d 49.33 56.09 13.7%
        u*d 14.85 16.97 14.3%
        state 25.95 30.12 16.1%
        unit* 22.72 26.88 18.3%
        uni* 12.64 15.20 20.2%
        doctimesecnum:[10000 TO 60000] 8.42 10.73 27.4%

        +1 to commit.

        Show
        Michael McCandless added a comment - I'm seeing excellent gains w/ this patch, on Linux 64bit Java 6 NIOFSDir: Query QPS clean QPS robspec Pct diff spanFirst(unit, 5) 16.67 15.62 -6.3% "unit state" 8.04 7.87 -2.2% spanNear( [unit, state] , 10, true) 4.31 4.25 -1.2% "unit state"~3 4.85 5.02 3.6% unit state 10.35 10.94 5.7% unit~1.0 9.60 10.15 5.7% unit~2.0 9.35 9.94 6.3% united~2.0 3.30 3.51 6.4% +nebraska +state 161.71 174.23 7.7% +unit +state 11.20 12.09 8.0% doctitle:. [Uu] nited. 3.93 4.25 8.0% united~1.0 15.12 16.39 8.4% un*d 49.33 56.09 13.7% u*d 14.85 16.97 14.3% state 25.95 30.12 16.1% unit* 22.72 26.88 18.3% uni* 12.64 15.20 20.2% doctimesecnum: [10000 TO 60000] 8.42 10.73 27.4% +1 to commit.
        Hide
        Robert Muir added a comment -

        Committed revision 1061619, 1061622

        Show
        Robert Muir added a comment - Committed revision 1061619, 1061622
        Hide
        Grant Ingersoll added a comment -

        Bulk close for 3.1

        Show
        Grant Ingersoll added a comment - Bulk close for 3.1

          People

          • Assignee:
            Robert Muir
            Reporter:
            Robert Muir
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development