Solr
  1. Solr
  2. SOLR-7110

Optimize JavaBinCodec to minimize string Object creation

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 5.2, 6.0
    • Component/s: None
    • Labels:
      None

      Description

      In JavabinCodec we already optimize on strings creation , if they are repeated in the same payload. if we use a cache it is possible to avoid string creation across objects as well.

      1. JavabinPerf.patch
        3 kB
        Noble Paul
      2. JavabinPerf.patch
        3 kB
        Noble Paul
      3. SOLR-7110.patch
        17 kB
        Noble Paul
      4. SOLR-7110.patch
        22 kB
        Noble Paul
      5. SOLR-7110.patch
        17 kB
        Noble Paul

        Activity

        Hide
        Noble Paul added a comment -

        Yonik Seeley would be glad to know your comments on this

        Show
        Noble Paul added a comment - Yonik Seeley would be glad to know your comments on this
        Hide
        Shalin Shekhar Mangar added a comment -

        Is this change back-compatible?

        Show
        Shalin Shekhar Mangar added a comment - Is this change back-compatible?
        Hide
        Yonik Seeley added a comment -

        Background for others who don't know how this works, Solr (javabin format) internally avoids repeating String keys by allowing strings to be specified by number if it's already been seen in the current message.

        But looking at the patch quickly, this isn't about reusing the "external string" across different messages. This is simply about avoiding String creation. Basically, one reads a sequence of UTF8 bytes off the stream and instead of creating a new String object, we check a cache may already have a String for those bytes. This isn't unique to JavaBin either... one could use the same technique in any of our transports (including HTTP params).

        Gut feel is that as written, this will be slower. The extra work + overhead of our concurrent LRU cache should swamp any savings. Has this been benchmarked?

        Show
        Yonik Seeley added a comment - Background for others who don't know how this works, Solr (javabin format) internally avoids repeating String keys by allowing strings to be specified by number if it's already been seen in the current message. But looking at the patch quickly, this isn't about reusing the "external string" across different messages. This is simply about avoiding String creation. Basically, one reads a sequence of UTF8 bytes off the stream and instead of creating a new String object, we check a cache may already have a String for those bytes. This isn't unique to JavaBin either... one could use the same technique in any of our transports (including HTTP params). Gut feel is that as written, this will be slower. The extra work + overhead of our concurrent LRU cache should swamp any savings. Has this been benchmarked?
        Hide
        Noble Paul added a comment -

        Is this change back-compatible?

        Yes, There is no change to the serialization format. Only the deserialization logic is changed

        This isn't unique to JavaBin either... one could use the same technique in any of our transports (including HTTP params).

        Yes. The good part about javabin is possibly repeated strings are written as a different type in javabin . In other payloads we need to identify what is the 'cacheable' string

        The extra work + overhead of our concurrent LRU cache should swamp any savings. Has this been benchmarked?

        I plan to do the benchmark. My hunch is that performance will be similar , a map.get() cannot be far more expensive than an Object creation and subsequent GC.
        Even if the performance is same this helps a lot on cutting down String objects and hence GC

        Show
        Noble Paul added a comment - Is this change back-compatible? Yes, There is no change to the serialization format. Only the deserialization logic is changed This isn't unique to JavaBin either... one could use the same technique in any of our transports (including HTTP params). Yes. The good part about javabin is possibly repeated strings are written as a different type in javabin . In other payloads we need to identify what is the 'cacheable' string The extra work + overhead of our concurrent LRU cache should swamp any savings. Has this been benchmarked? I plan to do the benchmark. My hunch is that performance will be similar , a map.get() cannot be far more expensive than an Object creation and subsequent GC. Even if the performance is same this helps a lot on cutting down String objects and hence GC
        Hide
        Noble Paul added a comment - - edited

        Here are the benchmark results

        No: of objects in cache 10K, no:of strings created: 10million: no:of threads:10

        ==============LRU 10 THREADS===============
        *************before test start***********
        Used Memory:10
        *************after LRU cache init***********
        Used Memory:14
        *************after cache test***********
        Used Memory:16
        time taken by LRUCACHE 709ms
        *************after new string test***********
        Used Memory:70MB
        time taken by string creation 668ms
        

        The takeaways are . As expected the time taken by both is negligible. It does not really matter. Memory usage is dramatically higher for string creation.
        54MB vs 6MB = ~700% more memory used. This probably has a bigger impact on our GC pauses

        Show
        Noble Paul added a comment - - edited Here are the benchmark results No: of objects in cache 10K, no:of strings created: 10million: no:of threads:10 ==============LRU 10 THREADS=============== *************before test start*********** Used Memory:10 *************after LRU cache init*********** Used Memory:14 *************after cache test*********** Used Memory:16 time taken by LRUCACHE 709ms *************after new string test*********** Used Memory:70MB time taken by string creation 668ms The takeaways are . As expected the time taken by both is negligible. It does not really matter. Memory usage is dramatically higher for string creation. 54MB vs 6MB = ~700% more memory used. This probably has a bigger impact on our GC pauses
        Hide
        Yonik Seeley added a comment -

        It's not clear what's what here... did the test take 668ms without the patch and 709ms with the patch?

        Measuring memory isn't enough here... GC is very efficient at cleaning up short lived objects, and the only downside to letting GC do it is some CPU time, which should show up in a throughput benchmark if this is a win.

        Show
        Yonik Seeley added a comment - It's not clear what's what here... did the test take 668ms without the patch and 709ms with the patch? Measuring memory isn't enough here... GC is very efficient at cleaning up short lived objects, and the only downside to letting GC do it is some CPU time, which should show up in a throughput benchmark if this is a win.
        Hide
        Noble Paul added a comment -

        This is a micro benchmark. I'm attaching the perf test along with the patch. TestJavabinCodec.testPerf()

        Measuring memory isn't enough here... GC is very efficient at cleaning up short lived objects, and the only downside to letting GC do it is some CPU time, which should show up in a throughput benchmark if this is a win.

        The GC does not happen immediately at all . It may happen when it happens . But the real challenge is, Solr has an increasingly large memory footprint and as the heap grows , the probability of us pushing the JVM to do a full GC keeps going up . The overall CPU cost of this operation is still negligible , even when GC happens. So, this is not really an optimization on CPU, it is a memory optimization.

        Show
        Noble Paul added a comment - This is a micro benchmark. I'm attaching the perf test along with the patch. TestJavabinCodec.testPerf() Measuring memory isn't enough here... GC is very efficient at cleaning up short lived objects, and the only downside to letting GC do it is some CPU time, which should show up in a throughput benchmark if this is a win. The GC does not happen immediately at all . It may happen when it happens . But the real challenge is, Solr has an increasingly large memory footprint and as the heap grows , the probability of us pushing the JVM to do a full GC keeps going up . The overall CPU cost of this operation is still negligible , even when GC happens. So, this is not really an optimization on CPU, it is a memory optimization.
        Hide
        Yonik Seeley added a comment -

        This should really only affect how fast the young gen fills up, which should not have an effect on full GCs.

        So, this is not really an optimization on CPU, it is a memory optimization.

        The question at hand is if the memory optimization is worth the CPU overhead here. What's the long term decoding throughput before and after the patch?

        Show
        Yonik Seeley added a comment - This should really only affect how fast the young gen fills up, which should not have an effect on full GCs. So, this is not really an optimization on CPU, it is a memory optimization. The question at hand is if the memory optimization is worth the CPU overhead here. What's the long term decoding throughput before and after the patch?
        Hide
        Noble Paul added a comment -

        We are talking about ~100ms diff for a 10 million string objects . Assuming, there are a few dozen cached strings in every request , the CPU costs are not even quantifiable.

        Show
        Noble Paul added a comment - We are talking about ~100ms diff for a 10 million string objects . Assuming, there are a few dozen cached strings in every request , the CPU costs are not even quantifiable.
        Hide
        Otis Gospodnetic added a comment -

        Possibly better ways to test:

        • use something like SPM or VisualVM or anything that gives you visualization of:
          • various memory pools (size + utilization) in the heap
          • GC activity (frequency, avg time, max time, size, etc.)
          • CPU usage
        • enable GC logging, grep for FullGC, or run jstat

        .... all of this over time - not just a few minutes, but longer runs before patch vs. after patch. Then you can really see what difference this makes.

        Show
        Otis Gospodnetic added a comment - Possibly better ways to test: use something like SPM or VisualVM or anything that gives you visualization of: various memory pools (size + utilization) in the heap GC activity (frequency, avg time, max time, size, etc.) CPU usage enable GC logging, grep for FullGC, or run jstat .... all of this over time - not just a few minutes, but longer runs before patch vs. after patch. Then you can really see what difference this makes.
        Hide
        Noble Paul added a comment - - edited

        My point is, the CPU costs of map lookups are zero (or near zero. may be a nanosecond or less in every request) .Memory costs of are obvious from these simple tests.

        IMHO , Solr is a memory hog. We should proactively try to bring it down before it becomes unmanageable

        Show
        Noble Paul added a comment - - edited My point is, the CPU costs of map lookups are zero (or near zero. may be a nanosecond or less in every request) .Memory costs of are obvious from these simple tests. IMHO , Solr is a memory hog. We should proactively try to bring it down before it becomes unmanageable
        Hide
        Noble Paul added a comment - - edited

        Changed the scope to just add the functionality to JavabinCodec but not really use it anywhere. I guess this can go in and we can decide whether to use it later. So this currently will have no impact on performance till we use it directly in some component

        Show
        Noble Paul added a comment - - edited Changed the scope to just add the functionality to JavabinCodec but not really use it anywhere. I guess this can go in and we can decide whether to use it later. So this currently will have no impact on performance till we use it directly in some component
        Hide
        ASF subversion and git services added a comment -

        Commit 1673149 from Noble Paul in branch 'dev/trunk'
        [ https://svn.apache.org/r1673149 ]

        SOLR-7110: Optimize JavaBinCodec to minimize string Object creation

        Show
        ASF subversion and git services added a comment - Commit 1673149 from Noble Paul in branch 'dev/trunk' [ https://svn.apache.org/r1673149 ] SOLR-7110 : Optimize JavaBinCodec to minimize string Object creation
        Hide
        ASF subversion and git services added a comment -

        Commit 1673150 from Noble Paul in branch 'dev/branches/branch_5x'
        [ https://svn.apache.org/r1673150 ]

        SOLR-7110: Optimize JavaBinCodec to minimize string Object creation

        Show
        ASF subversion and git services added a comment - Commit 1673150 from Noble Paul in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1673150 ] SOLR-7110 : Optimize JavaBinCodec to minimize string Object creation
        Hide
        Noble Paul added a comment -

        As of now , it is not used anywhere but the feature is in

        Show
        Noble Paul added a comment - As of now , it is not used anywhere but the feature is in
        Hide
        ASF subversion and git services added a comment -

        Commit 1673161 from Yonik Seeley in branch 'dev/trunk'
        [ https://svn.apache.org/r1673161 ]

        SOLR-7110: reformat new code

        Show
        ASF subversion and git services added a comment - Commit 1673161 from Yonik Seeley in branch 'dev/trunk' [ https://svn.apache.org/r1673161 ] SOLR-7110 : reformat new code
        Hide
        ASF subversion and git services added a comment -

        Commit 1673162 from Yonik Seeley in branch 'dev/branches/branch_5x'
        [ https://svn.apache.org/r1673162 ]

        SOLR-7110: reformat new code

        Show
        ASF subversion and git services added a comment - Commit 1673162 from Yonik Seeley in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1673162 ] SOLR-7110 : reformat new code
        Hide
        ASF subversion and git services added a comment -

        Commit 1673186 from Yonik Seeley in branch 'dev/branches/branch_5x'
        [ https://svn.apache.org/r1673186 ]

        SOLR-7110: fix break to 5x build

        Show
        ASF subversion and git services added a comment - Commit 1673186 from Yonik Seeley in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1673186 ] SOLR-7110 : fix break to 5x build
        Hide
        Shalin Shekhar Mangar added a comment -

        I see that in JavaBinCodec.readExternString you have changed:

        -      String s = (String) readVal(fis);
        +      tagByte = fis.readByte();
        +      String s = readStr(fis, stringCache);
        

        But I cannot find the corresponding change to writeExternString. How does this work?

        Show
        Shalin Shekhar Mangar added a comment - I see that in JavaBinCodec.readExternString you have changed: - String s = ( String ) readVal(fis); + tagByte = fis.readByte(); + String s = readStr(fis, stringCache); But I cannot find the corresponding change to writeExternString. How does this work?
        Hide
        Yonik Seeley added a comment -

        OK I did some performance testing (since I don't see the reason to commit something that won't be used). I decoded random documents with the same set of fields (meaning there will be 100% hit rate on the string cache - a best case scenario for it).

        Single threaded: using a string cache impacted decoding performance anywhere from 2.5-7 percent... median looked to be around 3.5% lower performance.
        2 core processor with 4 decoding threads: I saw decreases in performance ranging from 18% to 30%
        4 core processor with 4 decoding threads: I saw decreases in performance averaging about 23%

        So in general, it seems like trying to cache relatively small objects with a relatively expensive cache is a lose.

        Show
        Yonik Seeley added a comment - OK I did some performance testing (since I don't see the reason to commit something that won't be used). I decoded random documents with the same set of fields (meaning there will be 100% hit rate on the string cache - a best case scenario for it). Single threaded: using a string cache impacted decoding performance anywhere from 2.5-7 percent... median looked to be around 3.5% lower performance. 2 core processor with 4 decoding threads: I saw decreases in performance ranging from 18% to 30% 4 core processor with 4 decoding threads: I saw decreases in performance averaging about 23% So in general, it seems like trying to cache relatively small objects with a relatively expensive cache is a lose.
        Hide
        Yonik Seeley added a comment -

        But I cannot find the corresponding change to writeExternString. How does this work?

        readVal would normally read the tag byte then call readStr if it was a string, so hopefully it's equivalent.

        Show
        Yonik Seeley added a comment - But I cannot find the corresponding change to writeExternString. How does this work? readVal would normally read the tag byte then call readStr if it was a string, so hopefully it's equivalent.
        Hide
        Noble Paul added a comment - - edited

        Which test did you use? Did u use the checked in tests or created new ones?

        What is important is probably the absolute numbers. if it is slower by a few of nanoseconds and you get better memory efficiency it should be worth it. In %age of time taken in deserialization is already insignificant

        a 100% or near 100% cache hit will not be unusual , if you have cache a few 1000 strings because our keys are mostly repeated

        Show
        Noble Paul added a comment - - edited Which test did you use? Did u use the checked in tests or created new ones? What is important is probably the absolute numbers. if it is slower by a few of nanoseconds and you get better memory efficiency it should be worth it. In %age of time taken in deserialization is already insignificant a 100% or near 100% cache hit will not be unusual , if you have cache a few 1000 strings because our keys are mostly repeated
        Hide
        Noble Paul added a comment -

        yeah, both are equivalent.

        actually , even the old code using readVal() was redundant

        Show
        Noble Paul added a comment - yeah, both are equivalent. actually , even the old code using readVal() was redundant
        Hide
        Yonik Seeley added a comment -

        if it is slower by a few of nanoseconds

        I don't run things for a new nanoseconds in benchmarks. In this case, I ran enough decode iterations to run for at least 30 seconds per run, more than enough time for the cache to have a positive effect on garbage collection times as well.

        I am -1 on enabling/using this cache.

        Show
        Yonik Seeley added a comment - if it is slower by a few of nanoseconds I don't run things for a new nanoseconds in benchmarks. In this case, I ran enough decode iterations to run for at least 30 seconds per run, more than enough time for the cache to have a positive effect on garbage collection times as well. I am -1 on enabling/using this cache.
        Hide
        ASF subversion and git services added a comment -

        Commit 1673270 from Yonik Seeley in branch 'dev/trunk'
        [ https://svn.apache.org/r1673270 ]

        SOLR-7110: tests - java7 compilable

        Show
        ASF subversion and git services added a comment - Commit 1673270 from Yonik Seeley in branch 'dev/trunk' [ https://svn.apache.org/r1673270 ] SOLR-7110 : tests - java7 compilable
        Hide
        ASF subversion and git services added a comment -

        Commit 1673271 from Yonik Seeley in branch 'dev/branches/branch_5x'
        [ https://svn.apache.org/r1673271 ]

        SOLR-7110: tests - java7 compilable

        Show
        ASF subversion and git services added a comment - Commit 1673271 from Yonik Seeley in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1673271 ] SOLR-7110 : tests - java7 compilable
        Hide
        Noble Paul added a comment - - edited

        I ran the tests and I consistently got better perf with cache. Attached is the patch to modify the tests to get this output

        THREADS=1
        ####### test started w/o cache
        return=0 THROUGHPUT=222965
        ####### test started with cache
        return=0 THROUGHPUT=269978
        cache: hits=18999981 lookups=19000000 size=19
        %age improvement with cache :21
        
        THREADS=2
        ####### test started w/o cache
        return=0 THROUGHPUT=269396
        ####### test started with cache
        return=0 THROUGHPUT=278086
        cache: hits=18999981 lookups=19000000 size=19
        %age improvement with cache :3
        
        THREADS=3
        ####### test started w/o cache
        return=0 THROUGHPUT=276090
        ####### test started with cache
        return=0 THROUGHPUT=285062
        cache: hits=18999981 lookups=19000000 size=19
        %age improvement with cache :3
        
        THREADS=4
        ####### test started w/o cache
        return=0 THROUGHPUT=275633
        ####### test started with cache
        return=0 THROUGHPUT=282246
        cache: hits=18999981 lookups=19000000 size=19
        %age improvement with cache :2
        
        
        Show
        Noble Paul added a comment - - edited I ran the tests and I consistently got better perf with cache. Attached is the patch to modify the tests to get this output THREADS=1 ####### test started w/o cache return=0 THROUGHPUT=222965 ####### test started with cache return=0 THROUGHPUT=269978 cache: hits=18999981 lookups=19000000 size=19 %age improvement with cache :21 THREADS=2 ####### test started w/o cache return=0 THROUGHPUT=269396 ####### test started with cache return=0 THROUGHPUT=278086 cache: hits=18999981 lookups=19000000 size=19 %age improvement with cache :3 THREADS=3 ####### test started w/o cache return=0 THROUGHPUT=276090 ####### test started with cache return=0 THROUGHPUT=285062 cache: hits=18999981 lookups=19000000 size=19 %age improvement with cache :3 THREADS=4 ####### test started w/o cache return=0 THROUGHPUT=275633 ####### test started with cache return=0 THROUGHPUT=282246 cache: hits=18999981 lookups=19000000 size=19 %age improvement with cache :2
        Hide
        Yonik Seeley added a comment -

        Please don't commit this patch to the perf benchmark - you shouldn't be running different alternatives in the same JVM run.

        Show
        Yonik Seeley added a comment - Please don't commit this patch to the perf benchmark - you shouldn't be running different alternatives in the same JVM run.
        Hide
        Noble Paul added a comment -

        I'm not committing this. This is just to let you know that how I got these numbers.

        Do you mean to say the numbers are vastly different if you run in different JVMs ?

        Show
        Noble Paul added a comment - I'm not committing this. This is just to let you know that how I got these numbers. Do you mean to say the numbers are vastly different if you run in different JVMs ?
        Hide
        Yonik Seeley added a comment -

        Do you mean to say the numbers are vastly different if you run in different JVMs ?

        Nope. I'm just saying that running different variants of something in the same JVM throws mud in the water, and the results could be different.

        • first variant run has to pay hotspot cost to optimize, things tend to get faster the longer they run (up to a limit of course)
        • hotspot will often specialize for the first variant run, and that can penalize later variants
        • GC costs will leak from one variant to another variants
        Show
        Yonik Seeley added a comment - Do you mean to say the numbers are vastly different if you run in different JVMs ? Nope. I'm just saying that running different variants of something in the same JVM throws mud in the water, and the results could be different. first variant run has to pay hotspot cost to optimize, things tend to get faster the longer they run (up to a limit of course) hotspot will often specialize for the first variant run, and that can penalize later variants GC costs will leak from one variant to another variants
        Hide
        Noble Paul added a comment -

        first variant run has to pay hotspot cost to optimize, things tend to get faster the longer they run (up to a limit of course)

        So, we should do some simple warm up before running all tests

        Anyway, Wha twere the numbers you got?

        Show
        Noble Paul added a comment - first variant run has to pay hotspot cost to optimize, things tend to get faster the longer they run (up to a limit of course) So, we should do some simple warm up before running all tests Anyway, Wha twere the numbers you got?
        Hide
        Noble Paul added a comment - - edited

        Here is the result with adding a dry run in the beginning and gc() between runs

        THREADS=1
        ####### test started w/o cache
        return=0 THROUGHPUT=225835
        ####### test started with cache
        return=0 THROUGHPUT=273074
        cache: hits=18999981 lookups=19000000 size=19
        %age improvement with cache :20
        ====DRY RUN ignore the last====
        
        
        THREADS=1
        ####### test started w/o cache
        return=0 THROUGHPUT=257665
        ####### test started with cache
        return=0 THROUGHPUT=271296
        cache: hits=18999981 lookups=19000000 size=19
        %age improvement with cache :5
        
        THREADS=2
        ####### test started w/o cache
        return=0 THROUGHPUT=267881
        ####### test started with cache
        return=0 THROUGHPUT=276090
        cache: hits=18999981 lookups=19000000 size=19
        %age improvement with cache :3
        
        THREADS=3
        ####### test started w/o cache
        return=0 THROUGHPUT=266737
        ####### test started with cache
        return=0 THROUGHPUT=268744
        cache: hits=18999981 lookups=19000000 size=19
        %age improvement with cache :0
        
        THREADS=4
        ####### test started w/o cache
        return=0 THROUGHPUT=263782
        ####### test started with cache
        return=0 THROUGHPUT=278862
        cache: hits=18999981 lookups=19000000 size=19
        %age improvement with cache :5
        
        Show
        Noble Paul added a comment - - edited Here is the result with adding a dry run in the beginning and gc() between runs THREADS=1 ####### test started w/o cache return=0 THROUGHPUT=225835 ####### test started with cache return=0 THROUGHPUT=273074 cache: hits=18999981 lookups=19000000 size=19 %age improvement with cache :20 ====DRY RUN ignore the last==== THREADS=1 ####### test started w/o cache return=0 THROUGHPUT=257665 ####### test started with cache return=0 THROUGHPUT=271296 cache: hits=18999981 lookups=19000000 size=19 %age improvement with cache :5 THREADS=2 ####### test started w/o cache return=0 THROUGHPUT=267881 ####### test started with cache return=0 THROUGHPUT=276090 cache: hits=18999981 lookups=19000000 size=19 %age improvement with cache :3 THREADS=3 ####### test started w/o cache return=0 THROUGHPUT=266737 ####### test started with cache return=0 THROUGHPUT=268744 cache: hits=18999981 lookups=19000000 size=19 %age improvement with cache :0 THREADS=4 ####### test started w/o cache return=0 THROUGHPUT=263782 ####### test started with cache return=0 THROUGHPUT=278862 cache: hits=18999981 lookups=19000000 size=19 %age improvement with cache :5
        Hide
        Yonik Seeley added a comment -

        So, we should do some simple warm up before running all tests

        No... it's much more difficult to account for all the mud in the water by running different options in the same JVM run. It's simplest to just not put mud in the water in the first place. One needs to run each variant multiple times in different JVM runs as well since hotspot can sometimes optimize one run pretty well by luck.

        Oh, and due to CPU speed changes due to thermal throttling, it's probably best to alternate variants as well. I did something like the following:
        for i in 1 2 3 4 5 6 7 8 9 10; do test_variant1; test variant 2; done

        Anyway, Wha twere the numbers you got?

        I gave the aggregate results... I don't have time to re-run them all now.

        Show
        Yonik Seeley added a comment - So, we should do some simple warm up before running all tests No... it's much more difficult to account for all the mud in the water by running different options in the same JVM run. It's simplest to just not put mud in the water in the first place. One needs to run each variant multiple times in different JVM runs as well since hotspot can sometimes optimize one run pretty well by luck. Oh, and due to CPU speed changes due to thermal throttling, it's probably best to alternate variants as well. I did something like the following: for i in 1 2 3 4 5 6 7 8 9 10; do test_variant1; test variant 2; done Anyway, Wha twere the numbers you got? I gave the aggregate results... I don't have time to re-run them all now.
        Hide
        Noble Paul added a comment -

        No... it's much more difficult to account for all the mud in the water by running different options in the same JVM run.

        It is well known that the first run will be paying the penalty anyway. That is obvious from results I posted last. The non-cache run was 20% slower in the dry run and the next run with same set of params yield only a 5% advantage.

        I ran the same multiple times and I never saw a case where the cache was slower.

        As we know that caching has better memory efficiency and the performance is marginally better, it is worth investigating what is the real performance before we ruling out this solution

        It's simplest to just not put mud in the water in the first place

        I don't think you approach of fresh JVM is ideal because in reality we have a JVM that is warmed . According to me the better approach is to run this several times and look at the consistency of the numbers across different runs than running on a cold VM all the time. And my runs really show that caching consistently outperformed non caching (This is not into taking the GC costs into consideration at all)

        Show
        Noble Paul added a comment - No... it's much more difficult to account for all the mud in the water by running different options in the same JVM run. It is well known that the first run will be paying the penalty anyway. That is obvious from results I posted last. The non-cache run was 20% slower in the dry run and the next run with same set of params yield only a 5% advantage. I ran the same multiple times and I never saw a case where the cache was slower. As we know that caching has better memory efficiency and the performance is marginally better, it is worth investigating what is the real performance before we ruling out this solution It's simplest to just not put mud in the water in the first place I don't think you approach of fresh JVM is ideal because in reality we have a JVM that is warmed . According to me the better approach is to run this several times and look at the consistency of the numbers across different runs than running on a cold VM all the time. And my runs really show that caching consistently outperformed non caching (This is not into taking the GC costs into consideration at all)
        Hide
        Yonik Seeley added a comment -

        I don't think you approach of fresh JVM is ideal because in reality we have a JVM that is warmed .

        a) "warming" the JVM with a different option of what you are testing is bad, it will lead to hotspot specialization and then de-specialization.
        b) run the test for longer... most of the time will be
        c) if you really want to warm the JVM, do the same variant twice and just time the second one

        It is well known that [...]

        If you insist on inferring performance rather than directly measuring it w/o throwing unnecessary mud in the water, then I'm just going to start ignoring your results and reiterate my -1

        Show
        Yonik Seeley added a comment - I don't think you approach of fresh JVM is ideal because in reality we have a JVM that is warmed . a) "warming" the JVM with a different option of what you are testing is bad , it will lead to hotspot specialization and then de-specialization. b) run the test for longer... most of the time will be c) if you really want to warm the JVM, do the same variant twice and just time the second one It is well known that [...] If you insist on inferring performance rather than directly measuring it w/o throwing unnecessary mud in the water, then I'm just going to start ignoring your results and reiterate my -1
        Hide
        Anshum Gupta added a comment -

        Bulk close for 5.2.0.

        Show
        Anshum Gupta added a comment - Bulk close for 5.2.0.
        Hide
        ASF subversion and git services added a comment -

        Commit 1703874 from shalin@apache.org in branch 'dev/trunk'
        [ https://svn.apache.org/r1703874 ]

        SOLR-7110: Added entry to CHANGES.txt under 5.2.0

        Show
        ASF subversion and git services added a comment - Commit 1703874 from shalin@apache.org in branch 'dev/trunk' [ https://svn.apache.org/r1703874 ] SOLR-7110 : Added entry to CHANGES.txt under 5.2.0
        Hide
        ASF subversion and git services added a comment -

        Commit 1703875 from shalin@apache.org in branch 'dev/branches/branch_5x'
        [ https://svn.apache.org/r1703875 ]

        Added entry for SOLR-7110 and SOLR-7050 in the right places in CHANGES.txt

        Show
        ASF subversion and git services added a comment - Commit 1703875 from shalin@apache.org in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1703875 ] Added entry for SOLR-7110 and SOLR-7050 in the right places in CHANGES.txt

          People

          • Assignee:
            Noble Paul
            Reporter:
            Noble Paul
          • Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development