Velocity
  1. Velocity
  2. VELOCITY-606

Velocity 1.5 performance bottlenecks

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 1.5
    • Fix Version/s: 1.6
    • Component/s: Engine
    • Labels:
      None
    • Environment:
      Win XP, 1 Gb, single core, Maven 2, JUnitPerf, JRat, cached Velocity templates with a ClassLoader

      Description

      I did some quite extensive profiling to identify performance bottlenecks in Velocity 1.5.

      Using Maven 2, JUnitPerf and JRat I was able to identify these methods as top bottlenecks:

      org.apache.velocity.util.introspection ClassMap - findMethod(String,Object[])
      org.apache.velocity.util.introspection IntrospectorBase - getMethod(Class,String,Object[])
      org.apache.velocity.runtime.parser.node SimpleNode - literal()
      org.apache.velocity.runtime.parser.node SimpleNode - render(InternalContextAdapter,Writer)
      org.apache.commons.collections ExtendedProperties - getBoolean(String,boolean)
      org.apache.velocity.runtime.parser.node ASTReference - render(InternalContextAdapter,Writer)

      The first two eat over 50% of the CPU with many threads. See attached screenshots.

      Interestingly enough the synchronized
      org.apache.velocity.runtime RuntimeInstance getTemplate(String,String)

      isn't a big problem when templates are cached. However, if all resources are not cached it becomes a serious performance bottleneck. ResourceCacheImpl also uses a synchronized map which slows things down.

      I think these bottlenecks could be at least made less worse by reducing synchronization by using ConcurrentHashMap and StringBuilder that ship with JDK 1.5. I'm investigating what kind of benefits could be achieved with those.

      1. velocity-1.5-250-threads-loadtest.PNG
        127 kB
        Jarkko Viinamäki
      2. velocity-1.5-50-threads-loadtest.PNG
        127 kB
        Jarkko Viinamäki
      3. VELOCITY-606-light.patch
        11 kB
        Nathan Bubna
      4. VELOCITY-606.patch
        21 kB
        Nathan Bubna
      5. velocity-1.6-dev-concurrentmods.patch
        26 kB
        Jarkko Viinamäki
      6. velocity-1.6-dev-concurrentpatch-250-threads-loadtest.PNG
        119 kB
        Jarkko Viinamäki
      7. velocity-1.6-head-20080725-test.vm.PNG
        116 kB
        Jarkko Viinamäki

        Activity

        Hide
        Nathan Bubna added a comment -

        Ok, all patches provided for this issue so far have been committed. So far as i can tell using Jarkko's testbench, the hotspots he identified at the top of this issue have all been cooled significantly. Rather than continue using this as a catch-all for performance bottlenecks, i'm marking this resolved and leaving remaining performance problems to their own JIRA issues (e.g. VELOCITY-607).

        Jarkko, if you feel this issue is not resolved, please feel free to re-open!

        Show
        Nathan Bubna added a comment - Ok, all patches provided for this issue so far have been committed. So far as i can tell using Jarkko's testbench, the hotspots he identified at the top of this issue have all been cooled significantly. Rather than continue using this as a catch-all for performance bottlenecks, i'm marking this resolved and leaving remaining performance problems to their own JIRA issues (e.g. VELOCITY-607 ). Jarkko, if you feel this issue is not resolved, please feel free to re-open!
        Hide
        Nathan Bubna added a comment -

        Ok, so when i started helping with this, i assumed most synchronization was present because it needed to be for some reason, and that i could largely help by making it more fine-grained. I'm increasingly convinced that many of the synchronization hotspots are totally unnecessary. Particularly, it seems there's quite a few places where it is harmless to let multiple threads initially pass by a "get" duplicate creation/search work and then override earlier "put" calls. I've removed synchronization from some of them, all tests pass and there are no errors when load testing with Jarkko's testbench. I'll keep an eye out for more of these...

        Show
        Nathan Bubna added a comment - Ok, so when i started helping with this, i assumed most synchronization was present because it needed to be for some reason, and that i could largely help by making it more fine-grained. I'm increasingly convinced that many of the synchronization hotspots are totally unnecessary. Particularly, it seems there's quite a few places where it is harmless to let multiple threads initially pass by a "get" duplicate creation/search work and then override earlier "put" calls. I've removed synchronization from some of them, all tests pass and there are no errors when load testing with Jarkko's testbench. I'll keep an eye out for more of these...
        Hide
        Nathan Bubna added a comment -

        Yeah, VelocimacroFactory.startMacroRendering is clearly mis-designed to restrict the number of concurrent macro calls, rather than the depth of them. Ugh.

        Show
        Nathan Bubna added a comment - Yeah, VelocimacroFactory.startMacroRendering is clearly mis-designed to restrict the number of concurrent macro calls, rather than the depth of them. Ugh.
        Hide
        Jarkko Viinamäki added a comment -

        JRat profiling results of SVN revision (679708) with test.vm (included in my testbench)

        Show
        Jarkko Viinamäki added a comment - JRat profiling results of SVN revision (679708) with test.vm (included in my testbench)
        Hide
        Jarkko Viinamäki added a comment - - edited

        OK. I did some profiling with latest SVN head (679708). I have some good news and bad news.

        The good news is that the current head is slightly faster than 1.5 with some of my test templates. InspectorBase.getMethod still seems to be a major bottleneck after applying patches 606 and 595.

        However, the bad news is that modifications made to the macro system in 1.6 have made concurrent macro processing extremely slow and templates with many invocations to the same macro sometimes fail with "org.apache.velocity.exception.MacroOverflowException: Exceed maximum 20 macro calls. " when there are many concurrent threads. I'll file a separate issue about this.

        My pathetic little testbench is available at: http://www.iki.fi/wyla/velocity/testbench (free to use)

        With that it's relatively easy to run load tests and use JRat as profiler.

        Show
        Jarkko Viinamäki added a comment - - edited OK. I did some profiling with latest SVN head (679708). I have some good news and bad news. The good news is that the current head is slightly faster than 1.5 with some of my test templates. InspectorBase.getMethod still seems to be a major bottleneck after applying patches 606 and 595. However, the bad news is that modifications made to the macro system in 1.6 have made concurrent macro processing extremely slow and templates with many invocations to the same macro sometimes fail with "org.apache.velocity.exception.MacroOverflowException: Exceed maximum 20 macro calls. " when there are many concurrent threads. I'll file a separate issue about this. My pathetic little testbench is available at: http://www.iki.fi/wyla/velocity/testbench (free to use) With that it's relatively easy to run load tests and use JRat as profiler.
        Hide
        Nathan Bubna added a comment -

        Ok, i went ahead and adapted the RuntimeInstance, Macro, SimpleNode and ASTSetDirective changes from your patch and committed them. I still intend to look at and test the ResourceManagerImpl section of your patch, and will wait on your feedback on my changes to MethodMap and ClassMap before committing those. Though, i may upload a new patch for those that includes use of StrBuilder for you test.

        Show
        Nathan Bubna added a comment - Ok, i went ahead and adapted the RuntimeInstance, Macro, SimpleNode and ASTSetDirective changes from your patch and committed them. I still intend to look at and test the ResourceManagerImpl section of your patch, and will wait on your feedback on my changes to MethodMap and ClassMap before committing those. Though, i may upload a new patch for those that includes use of StrBuilder for you test.
        Hide
        Nathan Bubna added a comment -

        For Velocity 1.6, i upgraded the Commons-Lang dependency to v2.4. So, StrBuilder is available for use now.

        Show
        Nathan Bubna added a comment - For Velocity 1.6, i upgraded the Commons-Lang dependency to v2.4. So, StrBuilder is available for use now.
        Hide
        Jarkko Viinamäki added a comment -

        Thanks for your quick analysis and comments Nathan! Much appreciated.

        Your patch seems elegant so I'll checkout the source, apply the patches and run my profiling tests with those.

        Show
        Jarkko Viinamäki added a comment - Thanks for your quick analysis and comments Nathan! Much appreciated. Your patch seems elegant so I'll checkout the source, apply the patches and run my profiling tests with those.
        Hide
        Nathan Bubna added a comment -

        That's quite a wide mix of changes. Unfortunately, we will not be adding a dependency on JDK 1.5 for Velocity 1.6, so many of those can't be used. Yes, the new versions are faster, but there are many other concerns and interests out there besides performance. I'm certainly going to be in the crowd lobbying for a JDK 1.5 requirement in any Velocity 1.7 version, but it just isn't time to go there yet. And yeah, it sounds like you and i have heard similar things about the dangers of ConcurrentHashMap implementations under the older memory models. As for Commons-Lang's StrBuilder, we already depend on some Commons-Lang classes, so i think we ought to upgrade our version and use that class for now instead of StringBuffer or JDK 1.5's StringBuilder.

        So, while i appreciate the work you've done, would you consider adapting it to fit our JDK 1.4 requirement?

        Also, it looks like my patch(es) contain a number of things that yours does not. As i do not have any profiling tools or tests set up on my machine, i can't confirm the improvements. If you are looking for gains that do not require JDK 1.5, you might consider testing some of my changes as well. They should offer some improvement without using ConcurrentHashMap.

        Finally, i'm impressed by the way you went about de-synchronizing ResourceManagerImpl. It looks like a reasonable fix for VELOCITY-595. I'll definitely be giving that a try and posting it over on that issue too.

        Show
        Nathan Bubna added a comment - That's quite a wide mix of changes. Unfortunately, we will not be adding a dependency on JDK 1.5 for Velocity 1.6, so many of those can't be used. Yes, the new versions are faster, but there are many other concerns and interests out there besides performance. I'm certainly going to be in the crowd lobbying for a JDK 1.5 requirement in any Velocity 1.7 version, but it just isn't time to go there yet. And yeah, it sounds like you and i have heard similar things about the dangers of ConcurrentHashMap implementations under the older memory models. As for Commons-Lang's StrBuilder, we already depend on some Commons-Lang classes, so i think we ought to upgrade our version and use that class for now instead of StringBuffer or JDK 1.5's StringBuilder. So, while i appreciate the work you've done, would you consider adapting it to fit our JDK 1.4 requirement? Also, it looks like my patch(es) contain a number of things that yours does not. As i do not have any profiling tools or tests set up on my machine, i can't confirm the improvements. If you are looking for gains that do not require JDK 1.5, you might consider testing some of my changes as well. They should offer some improvement without using ConcurrentHashMap. Finally, i'm impressed by the way you went about de-synchronizing ResourceManagerImpl. It looks like a reasonable fix for VELOCITY-595 . I'll definitely be giving that a try and posting it over on that issue too.
        Hide
        Jarkko Viinamäki added a comment -

        JRat profiling screenshot after modifications.

        Show
        Jarkko Viinamäki added a comment - JRat profiling screenshot after modifications.
        Hide
        Jarkko Viinamäki added a comment -

        Lots of small modifications to improve performance. Thread-safety and locking issues may exist and have not been fully verified.

        Show
        Jarkko Viinamäki added a comment - Lots of small modifications to improve performance. Thread-safety and locking issues may exist and have not been fully verified.
        Hide
        Jarkko Viinamäki added a comment -

        I made some modifications using java.util.concurrent and some small tricks to optimize the code. In my tests (which uses macros and most of the velocity directives) I was able to get over 20% performance boost.

        Although all tests pass and I was able to run Velocity app on Jetty with quite high load (70 concurrent users constantly bombarding with 0 delay) and I didn't see any errors, there can certainly be concurrency problems, race conditions and all that nasty stuff in my modifications. Feel free to laugh at my incompetence.

        Modified jar, patch and screenshots are here:
        http://www.iki.fi/wyla/velocity/

        I'll also include them as attachment.

        Regarding ConcurrentHashMap: if I'm not mistaken, there was some fix in the Java memory model in JDK 1.5 so there might be some problems with the backported version. Also licensing issues regarding that are a bit unclear. Further JDK 1.6 is faster than 1.5 and 1.5 is faster than 1.4 so I wouldn't use an old version in production.

        Show
        Jarkko Viinamäki added a comment - I made some modifications using java.util.concurrent and some small tricks to optimize the code. In my tests (which uses macros and most of the velocity directives) I was able to get over 20% performance boost. Although all tests pass and I was able to run Velocity app on Jetty with quite high load (70 concurrent users constantly bombarding with 0 delay) and I didn't see any errors, there can certainly be concurrency problems, race conditions and all that nasty stuff in my modifications. Feel free to laugh at my incompetence. Modified jar, patch and screenshots are here: http://www.iki.fi/wyla/velocity/ I'll also include them as attachment. — Regarding ConcurrentHashMap: if I'm not mistaken, there was some fix in the Java memory model in JDK 1.5 so there might be some problems with the backported version. Also licensing issues regarding that are a bit unclear. Further JDK 1.6 is faster than 1.5 and 1.5 is faster than 1.4 so I wouldn't use an old version in production.
        Hide
        Nathan Bubna added a comment -

        Ok, here's a more significant patch that should speed up MethodMap.findMethod somewhat, uses less synchronization in general, and uses much finer-grained locking for ClassMap$MethodCache.get. All tests still pass, but i still haven't run any performance testing on this.

        I'll try to find some time to set up a profiler and see if these things really make a difference, but i would be much obliged if someone else had the time to do so.

        Oh, and this is all without using ConcurrentHashMap or StrBuilder, as i'd prefer to first see what can be done w/o introducing new dependencies.

        Show
        Nathan Bubna added a comment - Ok, here's a more significant patch that should speed up MethodMap.findMethod somewhat, uses less synchronization in general, and uses much finer-grained locking for ClassMap$MethodCache.get. All tests still pass, but i still haven't run any performance testing on this. I'll try to find some time to set up a profiler and see if these things really make a difference, but i would be much obliged if someone else had the time to do so. Oh, and this is all without using ConcurrentHashMap or StrBuilder, as i'd prefer to first see what can be done w/o introducing new dependencies.
        Hide
        Bob Schellink added a comment -

        Hi Nathan,

        I don't think the ConcurrentHashMap is a backport. It was around before Java 1.5 was released: http://gee.cs.oswego.edu/dl/classes/EDU/oswego/cs/dl/util/concurrent/intro.html

        Still the author (Doug Lea) admits it not being as good as the 1.5 implementation though.

        kind regards

        bob

        Show
        Bob Schellink added a comment - Hi Nathan, I don't think the ConcurrentHashMap is a backport. It was around before Java 1.5 was released: http://gee.cs.oswego.edu/dl/classes/EDU/oswego/cs/dl/util/concurrent/intro.html Still the author (Doug Lea) admits it not being as good as the 1.5 implementation though. kind regards bob
        Hide
        Nathan Bubna added a comment -

        Ok, this patch makes small improvements to makeMethodKey(), populateMethodCache(), and uses finer-grained locking in MethodCache.get/put.

        However, as i did this, i realize there's a lot more that could be done. MethodCache.put and MethodMap.add involve synchronization but are all done being used internally upon initialization of a ClassMap instance and are not externally accessible. It seems like ensuring ClassMap is not used until it is done being constructed would allow us to limit synchronization to MethodCache.get, and even that could probably be reduced to locking on a per-methodKey basis (similar to my patch for VELOCITY-595) if it turns out that MethodMap.find is where most time is spent.

        Show
        Nathan Bubna added a comment - Ok, this patch makes small improvements to makeMethodKey(), populateMethodCache(), and uses finer-grained locking in MethodCache.get/put. However, as i did this, i realize there's a lot more that could be done. MethodCache.put and MethodMap.add involve synchronization but are all done being used internally upon initialization of a ClassMap instance and are not externally accessible. It seems like ensuring ClassMap is not used until it is done being constructed would allow us to limit synchronization to MethodCache.get, and even that could probably be reduced to locking on a per-methodKey basis (similar to my patch for VELOCITY-595 ) if it turns out that MethodMap.find is where most time is spent.
        Hide
        Nathan Bubna added a comment -

        Jarkko, can you tell where most time in ClassMap.findMethod/MethodCache.get is spent? Is waiting for locks to be released or is it in MethodMap.find?

        I'm hesitant to use any pre-1.5 ConcurrentHashMap implementation, as i have read things about problems with backported versions. I've been given the impression that it can't really be done right with the older JMM. Commons-lang StrBuilder is fine.

        I do, however, wonder how much longer we'll be supporting 1.4, as even Sun will have finished "End-of-life"-ing it in October. I think once Velocity 1.6 is released, we should move to JDK 1.5. This will open the door to many, many improvements.

        Anyway, back to the question at hand, i looked over ClassMap and do see a few places little things can be improved. For instance, not all that happens within MethodCache.get/put needs to be synchronized, like the call to makeMethodKey(). Also, though it's not in the find() method, the populateMethodCache() method seems quite overwrought and inefficient. I don't see why it we call getMethods() and then checking each Method to see if it's public. We should either drop the check or use getDeclaredMethods(). I prefer the latter, as checking the modifiers of a lot of non-public methods is likely to be faster than repeatedly doing MethodCache.put for inherited public methods.

        Of course, i don't having any profiling set up on my machine right now, as the stuff i work on for my company these days isn't the bottleneck for our apps. I'll attach a patch with my suggestions, but i won't commit it unless someone can confirm the performance improvements for me. (on that note, i'm also hoping someone can test my patch for VELOCITY-595 as well .

        Show
        Nathan Bubna added a comment - Jarkko, can you tell where most time in ClassMap.findMethod/MethodCache.get is spent? Is waiting for locks to be released or is it in MethodMap.find? I'm hesitant to use any pre-1.5 ConcurrentHashMap implementation, as i have read things about problems with backported versions. I've been given the impression that it can't really be done right with the older JMM. Commons-lang StrBuilder is fine. I do, however, wonder how much longer we'll be supporting 1.4, as even Sun will have finished "End-of-life"-ing it in October. I think once Velocity 1.6 is released, we should move to JDK 1.5. This will open the door to many, many improvements. Anyway, back to the question at hand, i looked over ClassMap and do see a few places little things can be improved. For instance, not all that happens within MethodCache.get/put needs to be synchronized, like the call to makeMethodKey(). Also, though it's not in the find() method, the populateMethodCache() method seems quite overwrought and inefficient. I don't see why it we call getMethods() and then checking each Method to see if it's public. We should either drop the check or use getDeclaredMethods(). I prefer the latter, as checking the modifiers of a lot of non-public methods is likely to be faster than repeatedly doing MethodCache.put for inherited public methods. Of course, i don't having any profiling set up on my machine right now, as the stuff i work on for my company these days isn't the bottleneck for our apps. I'll attach a patch with my suggestions, but i won't commit it unless someone can confirm the performance improvements for me. (on that note, i'm also hoping someone can test my patch for VELOCITY-595 as well .
        Hide
        Will Glass-Husain added a comment -

        That's really interesting.

        Did your templates include macros or includes? There's been significant performance improvements with macros and includes in the (unreleased) Velocity 1.6 code base.

        Show
        Will Glass-Husain added a comment - That's really interesting. Did your templates include macros or includes? There's been significant performance improvements with macros and includes in the (unreleased) Velocity 1.6 code base.
        Hide
        Malcolm Edgar added a comment -

        Very interesting, always keen to see how we can improve Velocities performance.

        With regard to the alternative classes, I believe supporting JDK .14 is still very important in Velocity so I would be keen to use alternative 1.4 compatible implementations:

        EDU.oswego.cs.dl.util.concurrent.ConcurrentReaderHashMap

        and

        org.apache.commons.lang.text.StrBuilder

        regards Malcolm Edgar

        Show
        Malcolm Edgar added a comment - Very interesting, always keen to see how we can improve Velocities performance. With regard to the alternative classes, I believe supporting JDK .14 is still very important in Velocity so I would be keen to use alternative 1.4 compatible implementations: EDU.oswego.cs.dl.util.concurrent.ConcurrentReaderHashMap and org.apache.commons.lang.text.StrBuilder regards Malcolm Edgar

          People

          • Assignee:
            Unassigned
            Reporter:
            Jarkko Viinamäki
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development