Log4j 2
  1. Log4j 2
  2. LOG4J2-467

Thread name caching in async logger incompatible with use of Thread.setName()

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 2.0-beta9
    • Fix Version/s: 2.0-rc1, 2.0
    • Component/s: Core
    • Labels:
      None
    • Environment:

      Debian Squeeze amd64
      OpenJDK 7u25

      Description

      AsyncLogger caches a thread's name in a thread-local info variable. I make use of a thread pool where the submitted Runnables call Thread.setName() at the beginning of their task and the thread name is included in the log message. For an example of this behavior, see org.jboss.netty.util.ThreadRenamingRunnable in Netty 3.x. With the cached thread name, the log messages will contain whatever name the thread had when it logged for the first time and so long as the thread doesn't terminate (such as in a core pool thread), all log messages involving this thread will be erroneous. If Thread.getName has a significant performance impact for async logging, I would be satisfied if this behavior were configurable, perhaps on a per-logger basis, so that the penalty only needs to be taken by users who make use of Thread.setName()

      1. PerfTestDriver.java
        15 kB
        Remko Popma
      2. PerfTestDriver.java
        16 kB
        Remko Popma

        Activity

        Hide
        Remko Popma added a comment -

        Anthony, thanks for your feedback.
        Caching the thread name did give a significant performance improvement, so I would like to keep it as the default,
        but making it configurable is certainly possible.

        I'm still thinking about how to do this. (Perhaps do this as part of LOG4J2-321.)

        Do you need this to be configurable on a per-logger basis or is that a nice-to-have and would a global setting work for you?

        Show
        Remko Popma added a comment - Anthony, thanks for your feedback. Caching the thread name did give a significant performance improvement, so I would like to keep it as the default, but making it configurable is certainly possible. I'm still thinking about how to do this. (Perhaps do this as part of LOG4J2-321 .) Do you need this to be configurable on a per-logger basis or is that a nice-to-have and would a global setting work for you?
        Hide
        Gary Gregory added a comment -

        Was caching the thread name a premature optimization?

        It saves creating one string for each call to Thread#getName(), which would be called once per log event.

        Rather than adding the complexity of making this configurable, why not simply back out the caching?

        Show
        Gary Gregory added a comment - Was caching the thread name a premature optimization? It saves creating one string for each call to Thread#getName() , which would be called once per log event. Rather than adding the complexity of making this configurable, why not simply back out the caching?
        Hide
        Remko Popma added a comment -

        Gary, I don't remember the exact numbers but I remember the difference was significant (10% or more).
        I think the trade-off is worthwhile.

        Show
        Remko Popma added a comment - Gary, I don't remember the exact numbers but I remember the difference was significant (10% or more). I think the trade-off is worthwhile.
        Hide
        Anthony Baldocchi added a comment -

        Remko, thank you for your quick feedback. My use case does not require this to be configurable on a per-logger basis; a global setting would suit me fine. I mentioned the potential for it being set on a per-logger basis simply as a middle-ground option if there was an expedient method for enabling such.

        Show
        Anthony Baldocchi added a comment - Remko, thank you for your quick feedback. My use case does not require this to be configurable on a per-logger basis; a global setting would suit me fine. I mentioned the potential for it being set on a per-logger basis simply as a middle-ground option if there was an expedient method for enabling such.
        Hide
        Remko Popma added a comment -

        Anthony, understood, thanks!
        I've given it some thought and the per-logger config would be a lot more complex than a global config, so I'll start with the simple solution and will consider a per-logger solution if a strong use case comes up.

        Show
        Remko Popma added a comment - Anthony, understood, thanks! I've given it some thought and the per-logger config would be a lot more complex than a global config, so I'll start with the simple solution and will consider a per-logger solution if a strong use case comes up.
        Hide
        Gary Gregory added a comment -

        Hi Remko,

        I find it hard to believe that a method call and one new String per log event makes a 10% difference. Can you create a benchmark before we go on and make things even more complicated than they are? 10% of what is also an issue

        Gary

        Show
        Gary Gregory added a comment - Hi Remko, I find it hard to believe that a method call and one new String per log event makes a 10% difference. Can you create a benchmark before we go on and make things even more complicated than they are? 10% of what is also an issue Gary
        Hide
        Remko Popma added a comment - - edited

        Gary, I've tested the performance difference like you asked. I used the existing benchmark test that I used to generate the performance numbers for Async Loggers: org.apache.logging.log4j.core.async.perftest.*
        I modified the PerfTestDriver class to comment out all tests except for "Loggers all async", and tested for 1, 2 and 4 threads. Here are the results:

        Cached thread name (AsyncLogger line 203: info.cachedThreadName)
        1 thread : throughput: 4,380,953 ops/sec. (avg of 5 runs)
        2 threads: throughput: 2,481,713 ops/sec. (avg of 5 runs)
        4 threads: throughput: 5,712,182 ops/sec. (avg of 3 runs)
        
        Thread.currentThread().getName() (AsyncLogger line 202)
        1 thread : throughput: 3,762,280 ops/sec. (avg of 5 runs)
        2 threads: throughput: 2,805,796 ops/sec. (avg of 5 runs)
        4 threads: throughput: 2,193,664 ops/sec. (avg of 3 runs)
        

        I conclude that caching the thread name gives a significant performance benefit.

        I attached the PerfTestDriver class so you can run the tests yourself to verify these results if you want. Each run takes about 30 minutes. To toggle between Thread.currentThread().getName() and the cached thread name, you need to modify AsyncLogger and recompile.

        This command starts the test:

        cd log4j2
        mvn clean install -DskipTests
        cd log4j-core\target
        java -cp .;log4j-core-2.0RC1-SNAPSHOT-tests.jar;log4j-core-2.0RC1-SNAPSHOT.jar;..\..\log4j-api\target\log4j-api-2.0RC1-SNAPSHOT.jar;test-classes;c:\users\remko\.m2\repository/com/lmax/disruptor/3.2.0/disruptor-3.2.0.jar org.apache.logging.log4j.core.async.perftest.PerfTestDriver
        
        Show
        Remko Popma added a comment - - edited Gary, I've tested the performance difference like you asked. I used the existing benchmark test that I used to generate the performance numbers for Async Loggers: org.apache.logging.log4j.core.async.perftest.* I modified the PerfTestDriver class to comment out all tests except for "Loggers all async", and tested for 1, 2 and 4 threads. Here are the results: Cached thread name (AsyncLogger line 203: info.cachedThreadName) 1 thread : throughput: 4,380,953 ops/sec. (avg of 5 runs) 2 threads: throughput: 2,481,713 ops/sec. (avg of 5 runs) 4 threads: throughput: 5,712,182 ops/sec. (avg of 3 runs) Thread .currentThread().getName() (AsyncLogger line 202) 1 thread : throughput: 3,762,280 ops/sec. (avg of 5 runs) 2 threads: throughput: 2,805,796 ops/sec. (avg of 5 runs) 4 threads: throughput: 2,193,664 ops/sec. (avg of 3 runs) I conclude that caching the thread name gives a significant performance benefit. I attached the PerfTestDriver class so you can run the tests yourself to verify these results if you want. Each run takes about 30 minutes. To toggle between Thread.currentThread().getName() and the cached thread name, you need to modify AsyncLogger and recompile. This command starts the test: cd log4j2 mvn clean install -DskipTests cd log4j-core\target java -cp .;log4j-core-2.0RC1-SNAPSHOT-tests.jar;log4j-core-2.0RC1-SNAPSHOT.jar;..\..\log4j-api\target\log4j-api-2.0RC1-SNAPSHOT.jar;test-classes;c:\users\remko\.m2\repository/com/lmax/disruptor/3.2.0/disruptor-3.2.0.jar org.apache.logging.log4j.core.async.perftest.PerfTestDriver
        Hide
        Remko Popma added a comment -

        Anthony,
        I've added an option to switch off thread name caching with system property -DAsyncLogger.ThreadNameStrategy=UNCACHED

        Ideally this should be configurable in the log4j2 configuration file (and there is a Jira ticket LOG4J2-321 to address this) but this option is consistent with other options for tuning AsyncLogger behaviour.

        Fixed in revision 1555076.
        Please verify and close.

        Show
        Remko Popma added a comment - Anthony, I've added an option to switch off thread name caching with system property -DAsyncLogger.ThreadNameStrategy=UNCACHED Ideally this should be configurable in the log4j2 configuration file (and there is a Jira ticket LOG4J2-321 to address this) but this option is consistent with other options for tuning AsyncLogger behaviour. Fixed in revision 1555076. Please verify and close.
        Hide
        Gary Gregory added a comment -

        Thank you for the update Remko. Interesting numbers!

        Show
        Gary Gregory added a comment - Thank you for the update Remko. Interesting numbers!
        Hide
        Anthony Baldocchi added a comment -

        Using a single-threaded executor, thread names in my test log output change when the Runnable calls setName when tuning this option to UNCACHED and remain fixed with CACHED, which is the expected behavior. Verified and thanks.

        Show
        Anthony Baldocchi added a comment - Using a single-threaded executor, thread names in my test log output change when the Runnable calls setName when tuning this option to UNCACHED and remain fixed with CACHED, which is the expected behavior. Verified and thanks.
        Hide
        Remko Popma added a comment - - edited

        Gary, after rebooting my windows laptop I'm getting different numbers.

        Update

        I made a mistake running these tests. These results are all for the CACHED thread name scenario. Results for the UNCACHED scenario are in this comment below

        Run 1: CACHED
        (1 thread) : throughput: 4,407,972 ops/sec.
        (2 threads): throughput: 2,717,954 ops/sec.
        (4 threads): throughput: 1,616,480 ops/sec.
        
        Run 2: UNCACHED
        (1 thread) : throughput: 4,124,748 ops/sec.
        (2 threads): throughput: 1,928,624 ops/sec.
        (4 threads): throughput: 3,006,960 ops/sec.
        
        Run 3: CACHED
        (1 thread) : throughput: 4,987,891 ops/sec.
        (2 threads): throughput: 1,933,868 ops/sec.
        (4 threads): throughput: 3,222,753 ops/sec.
        
        Run 4: UNCACHED
        (1 thread) : throughput: 4,670,605 ops/sec.
        (2 threads): throughput: 2,215,214 ops/sec. 
        (4 threads): throughput: 1,154,320 ops/sec. 
        

        CACHED still has an advantage over UNCACHED, but there seems to be a lot of variance between runs. Especially the 4-thread results have too much variance to be useful. (This may be because my laptop only has 4 virtual cores, and occasionally running OS tasks skew the results.)

        I'll try running the performance test on a Unix box early next week.

        Show
        Remko Popma added a comment - - edited Gary, after rebooting my windows laptop I'm getting different numbers. Update I made a mistake running these tests. These results are all for the CACHED thread name scenario. Results for the UNCACHED scenario are in this comment below Run 1: CACHED (1 thread) : throughput: 4,407,972 ops/sec. (2 threads): throughput: 2,717,954 ops/sec. (4 threads): throughput: 1,616,480 ops/sec. Run 2: UNCACHED (1 thread) : throughput: 4,124,748 ops/sec. (2 threads): throughput: 1,928,624 ops/sec. (4 threads): throughput: 3,006,960 ops/sec. Run 3: CACHED (1 thread) : throughput: 4,987,891 ops/sec. (2 threads): throughput: 1,933,868 ops/sec. (4 threads): throughput: 3,222,753 ops/sec. Run 4: UNCACHED (1 thread) : throughput: 4,670,605 ops/sec. (2 threads): throughput: 2,215,214 ops/sec. (4 threads): throughput: 1,154,320 ops/sec. CACHED still has an advantage over UNCACHED, but there seems to be a lot of variance between runs. Especially the 4-thread results have too much variance to be useful. (This may be because my laptop only has 4 virtual cores, and occasionally running OS tasks skew the results.) I'll try running the performance test on a Unix box early next week.
        Hide
        Scott Deboy added a comment -

        What should be the bottleneck in this test? I ran this on my MBP and I see around 70% idle CPU, disk doesn't look like it's being thrashed.

        Show
        Scott Deboy added a comment - What should be the bottleneck in this test? I ran this on my MBP and I see around 70% idle CPU, disk doesn't look like it's being thrashed.
        Hide
        Remko Popma added a comment -

        Hi Scott!
        Good question, not sure.
        Disk access should be efficient as the RandomAccessFileAppender does a lot of buffering and becomes more efficient (batching multiple events together into a single I/O call) when it falls behind the producer. It could be lock contention, although use of the Disruptor in Async Loggers should resolve most of that. If your have 4 cores then the single-threaded part of the test should show around 75% idle CPU.
        How many cores do you have? (What is an MBP, by the way?)

        Can you post your results here? (Just the throughput numbers is fine.)

        Show
        Remko Popma added a comment - Hi Scott! Good question, not sure. Disk access should be efficient as the RandomAccessFileAppender does a lot of buffering and becomes more efficient (batching multiple events together into a single I/O call) when it falls behind the producer. It could be lock contention, although use of the Disruptor in Async Loggers should resolve most of that. If your have 4 cores then the single-threaded part of the test should show around 75% idle CPU. How many cores do you have? (What is an MBP, by the way?) Can you post your results here? (Just the throughput numbers is fine.)
        Hide
        Scott Deboy added a comment -

        The test fails part way through for me since it isn't resolving the slf4j classes: Exception in thread "main" java.lang.NoClassDefFoundError: org/slf4j/LoggerFactory

        I am only getting through the single threaded tests before it fails.

        Assume I need the slf4j API and an implementation? What's the right command to get those included?

        MBP means macbook pro..it's a mid 2012, 2.3 i7, SSD Samsung 840 pro w/16G, on Mavericks. Once I have a good command to run I'll update the bug with results.

        Show
        Scott Deboy added a comment - The test fails part way through for me since it isn't resolving the slf4j classes: Exception in thread "main" java.lang.NoClassDefFoundError: org/slf4j/LoggerFactory I am only getting through the single threaded tests before it fails. Assume I need the slf4j API and an implementation? What's the right command to get those included? MBP means macbook pro..it's a mid 2012, 2.3 i7, SSD Samsung 840 pro w/16G, on Mavericks. Once I have a good command to run I'll update the bug with results.
        Hide
        Remko Popma added a comment -

        (away from PC)
        You shouldn't need slf4j to run the modified PerfTestDriver attached to this Jira.

        Just found one issue with that PerfTestDriver: it doesn't pass command line system properties to the PerfTest process it creates, so it will only allow one to test the CACHED scenario.

        Will fix PerfTestDriver and attach to this Jira.

        Show
        Remko Popma added a comment - (away from PC) You shouldn't need slf4j to run the modified PerfTestDriver attached to this Jira. Just found one issue with that PerfTestDriver: it doesn't pass command line system properties to the PerfTest process it creates, so it will only allow one to test the CACHED scenario. Will fix PerfTestDriver and attach to this Jira.
        Hide
        Scott Deboy added a comment -

        I'm not sure why, but I do get an slf4j error as you see above:

        Logback: Sync (single thread) (1/5)...Exception in thread "main" java.lang.NoClassDefFoundError: org/slf4j/LoggerFactory

        Show
        Scott Deboy added a comment - I'm not sure why, but I do get an slf4j error as you see above: Logback: Sync (single thread) (1/5)...Exception in thread "main" java.lang.NoClassDefFoundError: org/slf4j/LoggerFactory
        Hide
        Remko Popma added a comment -

        Reopening so I can attach files and edit old comments.

        Show
        Remko Popma added a comment - Reopening so I can attach files and edit old comments.
        Hide
        Remko Popma added a comment -

        Scott,
        If you see "Logback..." you are not using the PerfTestDriver class attached to this Jira. The attached PerfTestDriver has all tests commented out except the Log4J2 AsyncLogger test.

        I've attached a new version of PerfTestDriver to this Jira that correctly passes command line property -DAsyncLogger.ThreadNameStrategy=UNCACHED (or CACHED) to the actual test process.

        To run this test:

        • update log4j2 project from SVN
        • replace log4j2/log4j-core/src/test/java/org/apache/logging/log4j/core/async/perftest/PerfTestDriver.java with the file attached to this Jira
        • build with mvn clean install -DskipTests
        • cd log4j2/log4j-core/target
        • java -DAsyncLogger.ThreadNameStrategy=UNCACHED -cp .;log4j-core-2.0RC1-SNAPSHOT-tests.jar;log4j-core-2.0RC1-SNAPSHOT.jar;../../log4j-api/target/log4j-api-2.0RC1-SNAPSHOT.jar;test-classes;c:/users/remko/.m2/repository/com/lmax/disruptor/3.2.0/disruptor-3.2.0.jar org.apache.logging.log4j.core.async.perftest.PerfTestDriver
          • (You probably need to modify the path to disruptor-3.2.0.jar)
        Show
        Remko Popma added a comment - Scott, If you see "Logback..." you are not using the PerfTestDriver class attached to this Jira. The attached PerfTestDriver has all tests commented out except the Log4J2 AsyncLogger test. I've attached a new version of PerfTestDriver to this Jira that correctly passes command line property -DAsyncLogger.ThreadNameStrategy=UNCACHED (or CACHED) to the actual test process. To run this test: update log4j2 project from SVN replace log4j2/log4j-core/src/test/java/org/apache/logging/log4j/core/async/perftest/PerfTestDriver.java with the file attached to this Jira build with mvn clean install -DskipTests cd log4j2/log4j-core/target java -DAsyncLogger.ThreadNameStrategy=UNCACHED -cp .;log4j-core-2.0RC1-SNAPSHOT-tests.jar;log4j-core-2.0RC1-SNAPSHOT.jar;../../log4j-api/target/log4j-api-2.0RC1-SNAPSHOT.jar;test-classes;c:/users/remko/.m2/repository/com/lmax/disruptor/3.2.0/disruptor-3.2.0.jar org.apache.logging.log4j.core.async.perftest.PerfTestDriver (You probably need to modify the path to disruptor-3.2.0.jar )
        Hide
        Remko Popma added a comment -

        Performance test result update:

        The numbers I reported in the comment above are all results for the CACHED thread name scenario. Here they are again:

        CACHED

        (1 thread) : throughput: 4,407,972 ops/sec.
        (2 threads): throughput: 2,717,954 ops/sec.
        (4 threads): throughput: 1,616,480 ops/sec.
        
        (1 thread) : throughput: 4,124,748 ops/sec.
        (2 threads): throughput: 1,928,624 ops/sec.
        (4 threads): throughput: 3,006,960 ops/sec.
        
        (1 thread) : throughput: 4,987,891 ops/sec.
        (2 threads): throughput: 1,933,868 ops/sec.
        (4 threads): throughput: 3,222,753 ops/sec.
        
        (1 thread) : throughput: 4,670,605 ops/sec.
        (2 threads): throughput: 2,215,214 ops/sec. 
        (4 threads): throughput: 1,154,320 ops/sec. 
        

        I've re-run the tests with the updated PerfTestDriver and the -DAsyncLogger.ThreadNameStrategy=UNCACHED option:

        UNCACHED

        (1 thread) : throughput: 3,429,421 ops/sec.
        (2 threads): throughput: 1,688,868 ops/sec.
        (4 threads): throughput: 2,207,599 ops/sec.
        
        (1 thread) : throughput: 4,684,796 ops/sec.
        (2 threads): throughput: 1,232,496 ops/sec.
        (4 threads): throughput: 2,906,195 ops/sec.
        
        (1 thread) : throughput: 3,404,258 ops/sec.
        (2 threads): throughput: 1,821,020 ops/sec.
        (4 threads): throughput: 4,254,485 ops/sec.
        

        (All on Windows 7 (64bit) with JDK1.7.0_45, 2-core Intel i5-3317u CPU @1.70Ghz with hyperthreading switched on (4 virtual cores)).

        Discounting the 4-thread tests (too much variance) I get these averages (in million ops/sec):

          Cached Uncached
        1 thread 4.5 3.8
        2 thread 2.2 1.6

        I will re-run the tests on a Unix box with more cores next week (should have less variance).

        Show
        Remko Popma added a comment - Performance test result update: The numbers I reported in the comment above are all results for the CACHED thread name scenario. Here they are again: CACHED (1 thread) : throughput: 4,407,972 ops/sec. (2 threads): throughput: 2,717,954 ops/sec. (4 threads): throughput: 1,616,480 ops/sec. (1 thread) : throughput: 4,124,748 ops/sec. (2 threads): throughput: 1,928,624 ops/sec. (4 threads): throughput: 3,006,960 ops/sec. (1 thread) : throughput: 4,987,891 ops/sec. (2 threads): throughput: 1,933,868 ops/sec. (4 threads): throughput: 3,222,753 ops/sec. (1 thread) : throughput: 4,670,605 ops/sec. (2 threads): throughput: 2,215,214 ops/sec. (4 threads): throughput: 1,154,320 ops/sec. I've re-run the tests with the updated PerfTestDriver and the -DAsyncLogger.ThreadNameStrategy=UNCACHED option: UNCACHED (1 thread) : throughput: 3,429,421 ops/sec. (2 threads): throughput: 1,688,868 ops/sec. (4 threads): throughput: 2,207,599 ops/sec. (1 thread) : throughput: 4,684,796 ops/sec. (2 threads): throughput: 1,232,496 ops/sec. (4 threads): throughput: 2,906,195 ops/sec. (1 thread) : throughput: 3,404,258 ops/sec. (2 threads): throughput: 1,821,020 ops/sec. (4 threads): throughput: 4,254,485 ops/sec. (All on Windows 7 (64bit) with JDK1.7.0_45, 2-core Intel i5-3317u CPU @1.70Ghz with hyperthreading switched on (4 virtual cores)). Discounting the 4-thread tests (too much variance) I get these averages (in million ops/sec):   Cached Uncached 1 thread 4.5 3.8 2 thread 2.2 1.6 I will re-run the tests on a Unix box with more cores next week (should have less variance).
        Hide
        Scott Deboy added a comment - - edited

        Details about my machine:
        java version "1.7.0_45"
        16 G memory
        Mid 2012 Macbook Pro, 2.3 i7
        Mavericks

        Summary? Is it fair to say this is a 'push'? At least on my box it looks like it is (or uncached you could argue is slightly better)

        Results:

        CACHED

        Ranking:
        1. Log4j2: Loggers all async (single thread): throughput: 7,023,291 ops/sec. latency(ns): avg=4137.8 99% < 16384.0 99.99% < 2936012.8 (53351 samples)
        2. Log4j2: Loggers all async (2 threads): throughput: 3,542,271 ops/sec. latency(ns): avg=2588.6 99% < 4505.6 99.99% < 2097152.0 (368968 samples)
        3. Log4j2: Loggers all async (4 threads): throughput: 1,895,205 ops/sec. latency(ns): avg=2431.8 99% < 4096.0 99.99% < 2097152.0 (741385 samples)

        Ranking:
        1. Log4j2: Loggers all async (single thread): throughput: 7,370,012 ops/sec. latency(ns): avg=4088.8 99% < 16384.0 99.99% < 3355443.2 (57130 samples)
        2. Log4j2: Loggers all async (2 threads): throughput: 3,602,343 ops/sec. latency(ns): avg=2448.9 99% < 3686.4 99.99% < 2097152.0 (406678 samples)
        3. Log4j2: Loggers all async (4 threads): throughput: 1,934,711 ops/sec. latency(ns): avg=2436.2 99% < 4096.0 99.99% < 2446677.3 (748575 samples)

        UNCACHED:

        Ranking:
        1. Log4j2: Loggers all async (single thread): throughput: 6,818,683 ops/sec. latency(ns): avg=3747.8 99% < 16384.0 99.99% < 2936012.8 (64379 samples)
        2. Log4j2: Loggers all async (2 threads): throughput: 3,915,888 ops/sec. latency(ns): avg=2616.1 99% < 4096.0 99.99% < 2306867.2 (368855 samples)
        3. Log4j2: Loggers all async (4 threads): throughput: 1,911,439 ops/sec. latency(ns): avg=2464.6 99% < 4096.0 99.99% < 2097152.0 (737069 samples)

        Ranking:
        1. Log4j2: Loggers all async (single thread): throughput: 7,449,397 ops/sec. latency(ns): avg=3675.4 99% < 16384.0 99.99% < 2516582.4 (61476 samples)
        2. Log4j2: Loggers all async (2 threads): throughput: 4,031,314 ops/sec. latency(ns): avg=2578.6 99% < 3481.6 99.99% < 2516582.4 (388410 samples)
        3. Log4j2: Loggers all async (4 threads): throughput: 2,625,581 ops/sec. latency(ns): avg=2454.1 99% < 4096.0 99.99% < 2970965.3 (764774 samples)

        Show
        Scott Deboy added a comment - - edited Details about my machine: java version "1.7.0_45" 16 G memory Mid 2012 Macbook Pro, 2.3 i7 Mavericks Summary? Is it fair to say this is a 'push'? At least on my box it looks like it is (or uncached you could argue is slightly better) Results: CACHED Ranking: 1. Log4j2: Loggers all async (single thread): throughput: 7,023,291 ops/sec. latency(ns): avg=4137.8 99% < 16384.0 99.99% < 2936012.8 (53351 samples) 2. Log4j2: Loggers all async (2 threads): throughput: 3,542,271 ops/sec. latency(ns): avg=2588.6 99% < 4505.6 99.99% < 2097152.0 (368968 samples) 3. Log4j2: Loggers all async (4 threads): throughput: 1,895,205 ops/sec. latency(ns): avg=2431.8 99% < 4096.0 99.99% < 2097152.0 (741385 samples) Ranking: 1. Log4j2: Loggers all async (single thread): throughput: 7,370,012 ops/sec. latency(ns): avg=4088.8 99% < 16384.0 99.99% < 3355443.2 (57130 samples) 2. Log4j2: Loggers all async (2 threads): throughput: 3,602,343 ops/sec. latency(ns): avg=2448.9 99% < 3686.4 99.99% < 2097152.0 (406678 samples) 3. Log4j2: Loggers all async (4 threads): throughput: 1,934,711 ops/sec. latency(ns): avg=2436.2 99% < 4096.0 99.99% < 2446677.3 (748575 samples) UNCACHED: Ranking: 1. Log4j2: Loggers all async (single thread): throughput: 6,818,683 ops/sec. latency(ns): avg=3747.8 99% < 16384.0 99.99% < 2936012.8 (64379 samples) 2. Log4j2: Loggers all async (2 threads): throughput: 3,915,888 ops/sec. latency(ns): avg=2616.1 99% < 4096.0 99.99% < 2306867.2 (368855 samples) 3. Log4j2: Loggers all async (4 threads): throughput: 1,911,439 ops/sec. latency(ns): avg=2464.6 99% < 4096.0 99.99% < 2097152.0 (737069 samples) Ranking: 1. Log4j2: Loggers all async (single thread): throughput: 7,449,397 ops/sec. latency(ns): avg=3675.4 99% < 16384.0 99.99% < 2516582.4 (61476 samples) 2. Log4j2: Loggers all async (2 threads): throughput: 4,031,314 ops/sec. latency(ns): avg=2578.6 99% < 3481.6 99.99% < 2516582.4 (388410 samples) 3. Log4j2: Loggers all async (4 threads): throughput: 2,625,581 ops/sec. latency(ns): avg=2454.1 99% < 4096.0 99.99% < 2970965.3 (764774 samples)
        Hide
        Remko Popma added a comment -

        Marking this as resolved again (I only reopened it to attach a file and edit an older comment).

        Show
        Remko Popma added a comment - Marking this as resolved again (I only reopened it to attach a file and edit an older comment).
        Hide
        Remko Popma added a comment -

        Interesting to see that last year's consumer hardware (Scott's MBP) is much faster than 5 year old enterprise hardware...

        Below are the results of running the perf tests on Solaris 10 (64bit) with JDK1.7.0_06, 4-core Xeon X5570 dual CPU @2.93Ghz with hyperthreading switched on (16 virtual cores):

        CACHED

        (1 thread) : throughput: 2,993,579 ops/sec. (avg of 5 runs)
        (2 threads): throughput: 1,643,640 ops/sec. (avg of 5 runs)
        (4 threads): throughput: 1,509,098 ops/sec. (avg of 3 runs)
        
        (1 thread) : throughput: 2,905,708 ops/sec. (     "       )
        (2 threads): throughput: 1,887,124 ops/sec. (     "       )
        (4 threads): throughput: 1,403,921 ops/sec. (     "       )
        
        (1 thread) : throughput: 3,031,088 ops/sec. (     "       )
        (2 threads): throughput: 1,828,200 ops/sec. (     "       )
        (4 threads): throughput: 1,517,553 ops/sec. (     "       )
        
        (1 thread) : throughput: 2,936,223 ops/sec. (     "       )
        (2 threads): throughput: 1,505,578 ops/sec. (     "       )
        (4 threads): throughput: 1,184,906 ops/sec. (     "       )
        

        UNCACHED

        (1 thread) : throughput: 2,368,698 ops/sec. (     "       )
        (2 threads): throughput: 1,360,309 ops/sec. (     "       )
        (4 threads): throughput:   998,752 ops/sec. (     "       )
        
        (1 thread) : throughput: 2,396,895 ops/sec. (     "       )
        (2 threads): throughput: 1,347,167 ops/sec. (     "       )
        (4 threads): throughput: 1,179,600 ops/sec. (     "       )
        
        (1 thread) : throughput: 2,354,092 ops/sec. (     "       )
        (2 threads): throughput: 1,444,437 ops/sec. (     "       )
        (4 threads): throughput: 1,089,047 ops/sec. (     "       )
        
        (1 thread) : throughput: 2,465,949 ops/sec. (     "       )
        (2 threads): throughput: 1,231,593 ops/sec. (     "       )
        (4 threads): throughput: 1,144,265 ops/sec. (     "       )
        

        The one thing that the enterprise hardware has going for it is that there is very little variance between runs.

        Summary (in million ops/sec):

          Cached Uncached
        1 thread 3.0 2.4
        2 threads 1.7 1.3
        4 threads 1.4 1.1

        On the Windows and Unix platforms I tested, caching the thread name seems to boost performance. I can't explain Scott's results. However, I think the results as a whole justify the slightly added complexity of caching the thread name.

        Show
        Remko Popma added a comment - Interesting to see that last year's consumer hardware (Scott's MBP) is much faster than 5 year old enterprise hardware... Below are the results of running the perf tests on Solaris 10 (64bit) with JDK1.7.0_06, 4-core Xeon X5570 dual CPU @2.93Ghz with hyperthreading switched on (16 virtual cores): CACHED (1 thread) : throughput: 2,993,579 ops/sec. (avg of 5 runs) (2 threads): throughput: 1,643,640 ops/sec. (avg of 5 runs) (4 threads): throughput: 1,509,098 ops/sec. (avg of 3 runs) (1 thread) : throughput: 2,905,708 ops/sec. ( " ) (2 threads): throughput: 1,887,124 ops/sec. ( " ) (4 threads): throughput: 1,403,921 ops/sec. ( " ) (1 thread) : throughput: 3,031,088 ops/sec. ( " ) (2 threads): throughput: 1,828,200 ops/sec. ( " ) (4 threads): throughput: 1,517,553 ops/sec. ( " ) (1 thread) : throughput: 2,936,223 ops/sec. ( " ) (2 threads): throughput: 1,505,578 ops/sec. ( " ) (4 threads): throughput: 1,184,906 ops/sec. ( " ) UNCACHED (1 thread) : throughput: 2,368,698 ops/sec. ( " ) (2 threads): throughput: 1,360,309 ops/sec. ( " ) (4 threads): throughput: 998,752 ops/sec. ( " ) (1 thread) : throughput: 2,396,895 ops/sec. ( " ) (2 threads): throughput: 1,347,167 ops/sec. ( " ) (4 threads): throughput: 1,179,600 ops/sec. ( " ) (1 thread) : throughput: 2,354,092 ops/sec. ( " ) (2 threads): throughput: 1,444,437 ops/sec. ( " ) (4 threads): throughput: 1,089,047 ops/sec. ( " ) (1 thread) : throughput: 2,465,949 ops/sec. ( " ) (2 threads): throughput: 1,231,593 ops/sec. ( " ) (4 threads): throughput: 1,144,265 ops/sec. ( " ) The one thing that the enterprise hardware has going for it is that there is very little variance between runs. Summary (in million ops/sec):   Cached Uncached 1 thread 3.0 2.4 2 threads 1.7 1.3 4 threads 1.4 1.1 On the Windows and Unix platforms I tested, caching the thread name seems to boost performance. I can't explain Scott's results. However, I think the results as a whole justify the slightly added complexity of caching the thread name.

          People

          • Assignee:
            Remko Popma
            Reporter:
            Anthony Baldocchi
          • Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development