OpenJPA
  1. OpenJPA
  2. OPENJPA-407

Cache SQL (or closer precursors to SQL) more aggressively

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.9.0, 0.9.6, 0.9.7, 1.0.0, 1.0.1, 1.0.2, 1.0.3, 1.0.4, 1.1.0, 1.1.1
    • Fix Version/s: 1.2.0
    • Component/s: jdbc, kernel, query, sql
    • Labels:
      None

      Description

      When data is not available in the data cache, OpenJPA dynamically creates SQL to look up the requested data. OpenJPA should more aggressively cache this SQL to accelerate pathways from a cache miss to the database.

      The generated SQL takes a number of factors into account, including the requested records, transaction status, currently-loaded data, and the current fetch configuration. Any caching would need to account for these factors as well.

      1. findBy.patch
        10 kB
        Kevin Sutter
      2. OPENJPA-407.patch
        80 kB
        Fay Wang
      3. OPENJPA-407.patch
        81 kB
        Jeremy Bauer
      4. OPENJPA-407.patch
        55 kB
        Kevin Sutter
      5. OPENJPA-407.patch
        66 kB
        Patrick Linskey
      6. QuerySQLCache.doc
        63 kB
        Jeremy Bauer

        Issue Links

          Activity

          Hide
          Kevin Sutter added a comment -

          I'm marking this Issue as Resolved for trunk (1.2.0). Version 1.1.1 is still listed as a potential Fix Version, but I'll leave that up to somebody else to see if it applies or if they want it.

          Additional sql generation optimizations or caching should be done via new related Issues.

          Thanks,
          Kevin

          Show
          Kevin Sutter added a comment - I'm marking this Issue as Resolved for trunk (1.2.0). Version 1.1.1 is still listed as a potential Fix Version, but I'll leave that up to somebody else to see if it applies or if they want it. Additional sql generation optimizations or caching should be done via new related Issues. Thanks, Kevin
          Hide
          Michael Dick added a comment -

          I've committed Fay & Jeremy's patch to trunk. The JIRA issue indicates that this issue is still a candidate for 1.1.0 - if it's deemed safe enough to put in 1.1.0 it should be easy to merge across.

          Show
          Michael Dick added a comment - I've committed Fay & Jeremy's patch to trunk. The JIRA issue indicates that this issue is still a candidate for 1.1.0 - if it's deemed safe enough to put in 1.1.0 it should be easy to merge across.
          Hide
          Fay Wang added a comment -

          Per Patrick's suggestions, this patch removes the equals() and hashCode() methods from JDBCFetchConfigurationImpl into the inner cache key class (SelectKey class) . It also sets fetchGroupContainsDefault to true in FetchConfigurationImpl.clearFetchGroups().

          Show
          Fay Wang added a comment - Per Patrick's suggestions, this patch removes the equals() and hashCode() methods from JDBCFetchConfigurationImpl into the inner cache key class (SelectKey class) . It also sets fetchGroupContainsDefault to true in FetchConfigurationImpl.clearFetchGroups().
          Hide
          Fay Wang added a comment -

          You are right that the equals and hashCode are for general purpose. It is risky to have the special purpose hashCode and equals methods in JDBCFetchConfigurationImpl. We will move the hashCode and equals from JDBCFetchConfigurationImpl into the SelectKey class to avoid interfering with other usage of hashCode and equals. I will post a patch soon with the changes suggested by Patrick.

          Show
          Fay Wang added a comment - You are right that the equals and hashCode are for general purpose. It is risky to have the special purpose hashCode and equals methods in JDBCFetchConfigurationImpl. We will move the hashCode and equals from JDBCFetchConfigurationImpl into the SelectKey class to avoid interfering with other usage of hashCode and equals. I will post a patch soon with the changes suggested by Patrick.
          Hide
          Patrick Linskey added a comment -

          > Perhaps we need to add a document indicating that the hashCode
          > and equals methods in JDBCFetchConfigurationImpl are used in
          > querySQLCache for findBy operation.

          I'm a bit nervous about having equals() and hashCode() (which are by convention deemed to apply to an entire instance) only be meaningful in certain contexts. I think that it would make more sense to create some sort of inner cache key class whose equals() and hashCode() are custom-designed for the purposes of this cache.

          > Any other usage of hashCode/equals should consider the fetchInnderJoin?

          This change actually interfered with my OPENJPA-522 fix; I ended up switching to an IdentityHashMap to avoid the issue. At the time, I did not realize that the addition of equals() / hashCode() had caused this. I would not be surprised if there are other places in OpenJPA that implicitly make assumptions about equals() and hashCode() behavior for FetchConfigurations, although I can't think of any offhand.

          In general, I'm not a huge fan of equals() / hashCode() methods (and especially hashCode()) that change in value over the lifecycle of the object; this tends to make it very hard to put these types into data structures.

          > (5) In terms of which fields will go into the calcuation of hashcode and equals, we are
          > considering those fields that can be changed at the runtime via FetchPlan, since these
          > will be directly affect the generated sql. For those which could not be changed at runtime,
          > we are not including them in the hashcode or equals, hoping in doing so, we could
          > minimize the effort in the calculation of cache key.

          As I mentioned above, this seems fair for the purposes of a cache key, but not appropriate for the general equals() / hashCode() implementation for the instance.

          Show
          Patrick Linskey added a comment - > Perhaps we need to add a document indicating that the hashCode > and equals methods in JDBCFetchConfigurationImpl are used in > querySQLCache for findBy operation. I'm a bit nervous about having equals() and hashCode() (which are by convention deemed to apply to an entire instance) only be meaningful in certain contexts. I think that it would make more sense to create some sort of inner cache key class whose equals() and hashCode() are custom-designed for the purposes of this cache. > Any other usage of hashCode/equals should consider the fetchInnderJoin? This change actually interfered with my OPENJPA-522 fix; I ended up switching to an IdentityHashMap to avoid the issue. At the time, I did not realize that the addition of equals() / hashCode() had caused this. I would not be surprised if there are other places in OpenJPA that implicitly make assumptions about equals() and hashCode() behavior for FetchConfigurations, although I can't think of any offhand. In general, I'm not a huge fan of equals() / hashCode() methods (and especially hashCode()) that change in value over the lifecycle of the object; this tends to make it very hard to put these types into data structures. > (5) In terms of which fields will go into the calcuation of hashcode and equals, we are > considering those fields that can be changed at the runtime via FetchPlan, since these > will be directly affect the generated sql. For those which could not be changed at runtime, > we are not including them in the hashcode or equals, hoping in doing so, we could > minimize the effort in the calculation of cache key. As I mentioned above, this seems fair for the purposes of a cache key, but not appropriate for the general equals() / hashCode() implementation for the instance.
          Hide
          Fay Wang added a comment -

          Hi Patrick,

          Thank you for your comments.
          (1) regarding the clearFetchGroups, you are right. We will set the fetchGroupContainsDefault to true in this method.

          (2) The querySQLCacheInstance in JDBCConfigurationImpl is a map of maps. This map is hard coded as a HashMap, and can be accessed by multiple entity mangers. The key in this map is the class of the context where the SelectImpl is constructed, and the value is either CacheMap or ConcurrentCacheMap, depending on the qerySQLCache configuration plugin. Addmittedly, this usage of PluginValue is somehow different from the conventional usage.
          Currently, the querySQLCacheInstance has three entries: (see the extra doc that we attached earlier).

          key: org.apache.openjpa.jdbc.kernel.JDBCStoreManager
          value: CacheMap / ConcurrentCacheMap (depending on the plugin value of querySQLCache)

          key: org.apache.openjpa.jdbc.meta.strats.StoreCollectionFieldStrategy
          value: CacheMap / ConcurrentCacheMap (depending on the plugin value of querySQLCache)

          key: org.apache.openjpa.jdbc.meta.starts.RelationFieldStrategy
          value: CacheMap / ConcurrentCacheMap (depending on the plugn value of querySQLCache)

          (3) Regarding fetchInnerJoin, this field is newly added for JIRA-547, a fix for dynamic query involving inner fetch join. The addInnerJoinFetch is only called from JDBCStoreQuery. Since the patch we attached is for findBy only, and we want to keep the caculation of hash code as simple as possible., this fetchInnerJoin is therefore not in the calculation of the hashCode for JDBCFetchConfigurationImpl. Perhaps we need to add a document indicating that the hashCode and equals methods in JDBCFetchConfigurationImpl are used in querySQLCache for findBy operation. Any other usage of hashCode/equals should consider the fetchInnderJoin?

          (4) As you suggested, we will keep the ordering of hashcodes and equals the same.

          (5) In terms of which fields will go into the calcuation of hashcode and equals, we are considering those fields that can be changed at the runtime via FetchPlan, since these will be directly affect the generated sql. For those which could not be changed at runtime, we are not including them in the hashcode or equals, hoping in doing so, we could minimize the effort in the calculation of cache key.

          Show
          Fay Wang added a comment - Hi Patrick, Thank you for your comments. (1) regarding the clearFetchGroups, you are right. We will set the fetchGroupContainsDefault to true in this method. (2) The querySQLCacheInstance in JDBCConfigurationImpl is a map of maps. This map is hard coded as a HashMap, and can be accessed by multiple entity mangers. The key in this map is the class of the context where the SelectImpl is constructed, and the value is either CacheMap or ConcurrentCacheMap, depending on the qerySQLCache configuration plugin. Addmittedly, this usage of PluginValue is somehow different from the conventional usage. Currently, the querySQLCacheInstance has three entries: (see the extra doc that we attached earlier). key: org.apache.openjpa.jdbc.kernel.JDBCStoreManager value: CacheMap / ConcurrentCacheMap (depending on the plugin value of querySQLCache) key: org.apache.openjpa.jdbc.meta.strats.StoreCollectionFieldStrategy value: CacheMap / ConcurrentCacheMap (depending on the plugin value of querySQLCache) key: org.apache.openjpa.jdbc.meta.starts.RelationFieldStrategy value: CacheMap / ConcurrentCacheMap (depending on the plugn value of querySQLCache) (3) Regarding fetchInnerJoin, this field is newly added for JIRA-547, a fix for dynamic query involving inner fetch join. The addInnerJoinFetch is only called from JDBCStoreQuery. Since the patch we attached is for findBy only, and we want to keep the caculation of hash code as simple as possible., this fetchInnerJoin is therefore not in the calculation of the hashCode for JDBCFetchConfigurationImpl. Perhaps we need to add a document indicating that the hashCode and equals methods in JDBCFetchConfigurationImpl are used in querySQLCache for findBy operation. Any other usage of hashCode/equals should consider the fetchInnderJoin? (4) As you suggested, we will keep the ordering of hashcodes and equals the same. (5) In terms of which fields will go into the calcuation of hashcode and equals, we are considering those fields that can be changed at the runtime via FetchPlan, since these will be directly affect the generated sql. For those which could not be changed at runtime, we are not including them in the hashcode or equals, hoping in doing so, we could minimize the effort in the calculation of cache key.
          Hide
          Patrick Linskey added a comment -

          More comments:

          • In JDBCConfigurationImpl: you've hard-coded the query sql cache to be a HashMap. Generally, we've tried to use the plugin infrastructure to allow people to vary the implementations based on their needs. It looks like you've taken this into account in your plugin class, but you never use the newInstance() method of that class, so my read is that the value will always be the hard-coded HashMap.
          • In JDBCFetchConfigurationImpl.JDBCConfigurationState, is there any reason why equals() and hashCode() for the inner class don't consider the fetchInnerJoins field? Also, in general, I try to keep the ordering of fields the same between the equals() / hashCode() block and the declarations, to make it easier to spot these sorts of differences.
          • In JDBCFetchConfigurationImpl, it's surprising that the equals() and hashCode() only consider fetch groups, and not any of the other things in JDBCFetchConfiguration or its superclass. What's the rationale behind that?

          More comments to come. In general, it looks like a promising approach, but I need to do more reading. Also, I've sent off a build with the patch to our performance team to run through some internal tests.

          Show
          Patrick Linskey added a comment - More comments: In JDBCConfigurationImpl: you've hard-coded the query sql cache to be a HashMap. Generally, we've tried to use the plugin infrastructure to allow people to vary the implementations based on their needs. It looks like you've taken this into account in your plugin class, but you never use the newInstance() method of that class, so my read is that the value will always be the hard-coded HashMap. In JDBCFetchConfigurationImpl.JDBCConfigurationState, is there any reason why equals() and hashCode() for the inner class don't consider the fetchInnerJoins field? Also, in general, I try to keep the ordering of fields the same between the equals() / hashCode() block and the declarations, to make it easier to spot these sorts of differences. In JDBCFetchConfigurationImpl, it's surprising that the equals() and hashCode() only consider fetch groups, and not any of the other things in JDBCFetchConfiguration or its superclass. What's the rationale behind that? More comments to come. In general, it looks like a promising approach, but I need to do more reading. Also, I've sent off a build with the patch to our performance team to run through some internal tests.
          Hide
          Patrick Linskey added a comment -

          Hi,

          In FetchConfigurationImpl.clearFetchGroups(), shouldn't the fetchGroupContainsDefault boolean be set to true, not false? Based on the FetchConfiguration.clearFetchGroups() javadoc, it's my understanding that after a clear, the default fetch group is active.

          Show
          Patrick Linskey added a comment - Hi, In FetchConfigurationImpl.clearFetchGroups(), shouldn't the fetchGroupContainsDefault boolean be set to true, not false? Based on the FetchConfiguration.clearFetchGroups() javadoc, it's my understanding that after a clear, the default fetch group is active.
          Hide
          Jeremy Bauer added a comment -

          Attaching extra doc

          Show
          Jeremy Bauer added a comment - Attaching extra doc
          Hide
          Jeremy Bauer added a comment -

          Attaching patch

          Show
          Jeremy Bauer added a comment - Attaching patch
          Hide
          Jeremy Bauer added a comment -

          Attaching the patch Kevin referred to in his previous post. The patch contains code updates, testcases, and OpenJPA documentation for a new QuerySQLCache property. This property provides configurable EMF-scope caching of select statements employed by the find operation.

          Fay Wang (another OpenJPA contributor), in addition to making a large contribution to the code, wrote the bulk of the extra documentation (QuerySQLCache.doc) to go along with this patch. It explains the performance issue and how it is addressed by this update.

          The patch also contains a couple miscellanous updates (most notably a change in FetchConfigurationImpl to eliminate hash queries for known values) that were flagged as target areas by a code profiler.

          Please review the patch and post comments.

          Thanks,
          Jeremy

          Show
          Jeremy Bauer added a comment - Attaching the patch Kevin referred to in his previous post. The patch contains code updates, testcases, and OpenJPA documentation for a new QuerySQLCache property. This property provides configurable EMF-scope caching of select statements employed by the find operation. Fay Wang (another OpenJPA contributor), in addition to making a large contribution to the code, wrote the bulk of the extra documentation (QuerySQLCache.doc) to go along with this patch. It explains the performance issue and how it is addressed by this update. The patch also contains a couple miscellanous updates (most notably a change in FetchConfigurationImpl to eliminate hash queries for known values) that were flagged as target areas by a code profiler. Please review the patch and post comments. Thanks, Jeremy
          Hide
          Kevin Sutter added a comment -

          Although the previously posted patch helped with performance on an individual EM basis, that was not sufficient when we pushed it through more scalable tests. So, the patch has been re-worked to cache across EMs. The basic concept is the same, but now the cache is longer lived and provides scalable performance improvements.

          As soon as the testcases and doc updates are done, the new patch will be posted for comments. Hopefully, this version can be committed in the near future.

          Thanks,
          Kevin

          Show
          Kevin Sutter added a comment - Although the previously posted patch helped with performance on an individual EM basis, that was not sufficient when we pushed it through more scalable tests. So, the patch has been re-worked to cache across EMs. The basic concept is the same, but now the cache is longer lived and provides scalable performance improvements. As soon as the testcases and doc updates are done, the new patch will be posted for comments. Hopefully, this version can be committed in the near future. Thanks, Kevin
          Hide
          Kevin Sutter added a comment -

          Attaching a new version of the first patch for this Issue. All of the stated comments about the first patch have been addressed. But, this is not the "final" patch. There are still some areas that could use some improvement, but I wanted to get the basic updates in place so that we can move forward. Also, by getting these into trunk, we can wring out any potential problems.

          This patch introduces a property (modeled after the QueryCompilationCache property):

          openjpa.jdbc.QuerySQLCache
          o true - uses CacheMap (default)
          o all - uses a ConcurrentHashMap
          o false - none

          As before, this improvement attempts to cache the SelectImpls and associated SQLBuffers for common findBy operations. The cache is scoped to an EntityManager instance.

          The patch also includes new trace messages for cache hits and misses.

          Any comments would be appreciated. Like I mentioned, I would like to integrate this much of the patch soon since this is not a full-time Issue for me and it's a pain to continually update my environment. If we're generally okay with the approach, then I'd like to commit and then move on to the next level of SQL generation caching.

          Thanks,
          Kevin

          Show
          Kevin Sutter added a comment - Attaching a new version of the first patch for this Issue. All of the stated comments about the first patch have been addressed. But, this is not the "final" patch. There are still some areas that could use some improvement, but I wanted to get the basic updates in place so that we can move forward. Also, by getting these into trunk, we can wring out any potential problems. This patch introduces a property (modeled after the QueryCompilationCache property): openjpa.jdbc.QuerySQLCache o true - uses CacheMap (default) o all - uses a ConcurrentHashMap o false - none As before, this improvement attempts to cache the SelectImpls and associated SQLBuffers for common findBy operations. The cache is scoped to an EntityManager instance. The patch also includes new trace messages for cache hits and misses. Any comments would be appreciated. Like I mentioned, I would like to integrate this much of the patch soon since this is not a full-time Issue for me and it's a pain to continually update my environment. If we're generally okay with the approach, then I'd like to commit and then move on to the next level of SQL generation caching. Thanks, Kevin
          Hide
          Patrick Linskey added a comment -

          > > Generally, the patch looks sound. I agree with Christiaan's concerns; we should probably
          > > change the caches to be configurable data structures.
          >
          > I'm not convinced that a configurable data structure is necessary. We want the performance
          > of OpenJPA to be excellent "out of the box". We made a similar decision when we decided to
          > turn the Compilation Cache on by default. Maybe that's what you mean... Make it configurable,
          > but we can have it turned on by default? If that's the case, then I'm okay with that approach.

          Yep. This definitely seems like the sort of thing that should be on by default, with a reasonable default hard ref size and a soft ref spillover.

          > We'll post a more complete patch in the near future.

          Excellent... I'm looking forward to reading more.

          Show
          Patrick Linskey added a comment - > > Generally, the patch looks sound. I agree with Christiaan's concerns; we should probably > > change the caches to be configurable data structures. > > I'm not convinced that a configurable data structure is necessary. We want the performance > of OpenJPA to be excellent "out of the box". We made a similar decision when we decided to > turn the Compilation Cache on by default. Maybe that's what you mean... Make it configurable, > but we can have it turned on by default? If that's the case, then I'm okay with that approach. Yep. This definitely seems like the sort of thing that should be on by default, with a reasonable default hard ref size and a soft ref spillover. > We'll post a more complete patch in the near future. Excellent... I'm looking forward to reading more.
          Hide
          Kevin Sutter added a comment -

          Thanks, Patrick, for the review. We are also performing some internal performance tests. We'll let the team know how they go shortly... We're also looking at other related improvements to this findBy path. We'll post a more complete patch in the near future.

          > Generally, the patch looks sound. I agree with Christiaan's concerns; we should probably change the caches to be configurable data structures.

          I'm not convinced that a configurable data structure is necessary. We want the performance of OpenJPA to be excellent "out of the box". We made a similar decision when we decided to turn the Compilation Cache on by default. Maybe that's what you mean... Make it configurable, but we can have it turned on by default? If that's the case, then I'm okay with that approach.

          > - JDBCStoreManager has a private configurable Map<ClassMapping,SelectImpl>. This might be a non-LRU CacheMap, and the configuration for it should be exposed via a JDBCConfiguration setting.

          See above comments.

          > - The Map<SelectImpl,SQLBuffer> is moved into SelectImpl itself, obviating the need for the second public map.

          Yep, already figured that one out. Thanks.

          Thanks again,
          Kevin

          Show
          Kevin Sutter added a comment - Thanks, Patrick, for the review. We are also performing some internal performance tests. We'll let the team know how they go shortly... We're also looking at other related improvements to this findBy path. We'll post a more complete patch in the near future. > Generally, the patch looks sound. I agree with Christiaan's concerns; we should probably change the caches to be configurable data structures. I'm not convinced that a configurable data structure is necessary. We want the performance of OpenJPA to be excellent "out of the box". We made a similar decision when we decided to turn the Compilation Cache on by default. Maybe that's what you mean... Make it configurable, but we can have it turned on by default? If that's the case, then I'm okay with that approach. > - JDBCStoreManager has a private configurable Map<ClassMapping,SelectImpl>. This might be a non-LRU CacheMap, and the configuration for it should be exposed via a JDBCConfiguration setting. See above comments. > - The Map<SelectImpl,SQLBuffer> is moved into SelectImpl itself, obviating the need for the second public map. Yep, already figured that one out. Thanks. Thanks again, Kevin
          Hide
          Patrick Linskey added a comment -

          Generally, the patch looks sound. I agree with Christiaan's concerns; we should probably change the caches to be configurable data structures.

          I think that from a data structure standpoint, I would prefer something like the following:

          • JDBCStoreManager has a private configurable Map<ClassMapping,SelectImpl>. This might be a non-LRU CacheMap, and the configuration for it should be exposed via a JDBCConfiguration setting.
          • The Map<SelectImpl,SQLBuffer> is moved into SelectImpl itself, obviating the need for the second public map.

          We have not yet run the patch against our internal performance tests; I'd like to hold off on that until we get the data structures finalized. (The requisite configuration changes can be done afterwards; they shouldn't impact runtime performance behavior.)

          Show
          Patrick Linskey added a comment - Generally, the patch looks sound. I agree with Christiaan's concerns; we should probably change the caches to be configurable data structures. I think that from a data structure standpoint, I would prefer something like the following: JDBCStoreManager has a private configurable Map<ClassMapping,SelectImpl>. This might be a non-LRU CacheMap, and the configuration for it should be exposed via a JDBCConfiguration setting. The Map<SelectImpl,SQLBuffer> is moved into SelectImpl itself, obviating the need for the second public map. We have not yet run the patch against our internal performance tests; I'd like to hold off on that until we get the data structures finalized. (The requisite configuration changes can be done afterwards; they shouldn't impact runtime performance behavior.)
          Hide
          Christiaan added a comment -

          I making this remark since I was triggered by the "I changed the 'sql' local variable in SelectImpl.execute() to be a member variable". I've seen a similar approach in RowImpl and as described in OPENJPA-474, I think this causes a lot of memory overhead when a many similar objects and modifications are involved, since the same sql is duplicated. Of course it is a good thing to cache for performance, but I do think a cache should use memory if it is available (not leading to an OoM) or at least try to keep it to a minimum. Not sure if it is the case with this patch, I just thought of making you aware of it.

          regards,
          Christiaan

          Show
          Christiaan added a comment - I making this remark since I was triggered by the "I changed the 'sql' local variable in SelectImpl.execute() to be a member variable". I've seen a similar approach in RowImpl and as described in OPENJPA-474 , I think this causes a lot of memory overhead when a many similar objects and modifications are involved, since the same sql is duplicated. Of course it is a good thing to cache for performance, but I do think a cache should use memory if it is available (not leading to an OoM) or at least try to keep it to a minimum. Not sure if it is the case with this patch, I just thought of making you aware of it. regards, Christiaan
          Hide
          Patrick Linskey added a comment -

          I haven't read the patch yet; I'll take a look at it later this week. However, without having looked at it, I'm hesitant about putting something like this into the 1.0.x branch, as it seems like it's probably a decently-large behavioral change, and thus likely to introduce instability. I'd prefer if we tried to keep our maintenance branch changes confined to bugfixes. Of course, performance issues tend to be hard to classify concretely as bugs vs. new features.

          Show
          Patrick Linskey added a comment - I haven't read the patch yet; I'll take a look at it later this week. However, without having looked at it, I'm hesitant about putting something like this into the 1.0.x branch, as it seems like it's probably a decently-large behavioral change, and thus likely to introduce instability. I'd prefer if we tried to keep our maintenance branch changes confined to bugfixes. Of course, performance issues tend to be hard to classify concretely as bugs vs. new features.
          Hide
          Kevin Sutter added a comment -

          Instead of trying to resolve the whole sql caching problem, how about if we just start with caching the SQL for the findBy operation? Each Entity type has a unique SelectImpl and SQLBuffer associated with processing a find(class, id) operation, right? So, why not cache these results so that they can be re-used the next time the find operation is invoked?

          Looking back at Patrick's first patch for this issue, it looks like a few of the original ideas are duplicated in my patch. I am making the assumption that the EntityManager (and the associated JDBCStoreManager) is single-threaded. I am also making the assumption that the find method (and only the find method) utilizes the getInitializeStateResult() on the JDBCStoreManager.

          This is still a prototype at this point, but I would like to get some input as to the direction. The initial results look very promising. I've driven it with many clients with the findBy's performance increasing anywhere from 5-20% depending on the number of clients and machine capacity. Of course, this is limited to findBy's at this point, but if the general direction is okay, maybe we can extend it to more general queries.

          Just looking for input at this point. BTW, my findBy.patch file works with the 1.0.x branch. Still hoping we can improve the performance of the service stream....

          Kevin

          Show
          Kevin Sutter added a comment - Instead of trying to resolve the whole sql caching problem, how about if we just start with caching the SQL for the findBy operation? Each Entity type has a unique SelectImpl and SQLBuffer associated with processing a find(class, id) operation, right? So, why not cache these results so that they can be re-used the next time the find operation is invoked? Looking back at Patrick's first patch for this issue, it looks like a few of the original ideas are duplicated in my patch. I am making the assumption that the EntityManager (and the associated JDBCStoreManager) is single-threaded. I am also making the assumption that the find method (and only the find method) utilizes the getInitializeStateResult() on the JDBCStoreManager. This is still a prototype at this point, but I would like to get some input as to the direction. The initial results look very promising. I've driven it with many clients with the findBy's performance increasing anywhere from 5-20% depending on the number of clients and machine capacity. Of course, this is limited to findBy's at this point, but if the general direction is okay, maybe we can extend it to more general queries. Just looking for input at this point. BTW, my findBy.patch file works with the 1.0.x branch. Still hoping we can improve the performance of the service stream.... Kevin
          Hide
          Pinaki Poddar added a comment -

          > I think that it might be ok to ignore fetch, since I think it's the same as the one that was used to create the SelectImpl
          Did not see the patch but a point comment is FetchConfiguration in constructing a Select starts from a 'root' instance and then spawns other FetchConfiguration instances as it traverses the fetch graph. But the spawned as well as the root instance shares the same 'state' (ConfigurationState). They only differ by 'dynamic' state that tracks the current location in fetch graph traversal. Effectively the shared root instances would be identify a fetch configuration if one is considering to use it as a part of a lookup key.

          Show
          Pinaki Poddar added a comment - > I think that it might be ok to ignore fetch, since I think it's the same as the one that was used to create the SelectImpl Did not see the patch but a point comment is FetchConfiguration in constructing a Select starts from a 'root' instance and then spawns other FetchConfiguration instances as it traverses the fetch graph. But the spawned as well as the root instance shares the same 'state' (ConfigurationState). They only differ by 'dynamic' state that tracks the current location in fetch graph traversal. Effectively the shared root instances would be identify a fetch configuration if one is considering to use it as a part of a lookup key.
          Hide
          Patrick Linskey added a comment -

          I'm having a hard time reproducing both the 7.5 and the 5.5 numbers; I'm seeing numbers around 9 minutes for both clean trunk and my patch + prototype. The new stuff seems to be running a small amount faster, but not as much as I saw before. Odd.

          It is probably worthwhile to create a microbenchmark that exercises these pathways and try it out with that directly, instead of trying to infer something from our existing test execution speed.

          Show
          Patrick Linskey added a comment - I'm having a hard time reproducing both the 7.5 and the 5.5 numbers; I'm seeing numbers around 9 minutes for both clean trunk and my patch + prototype. The new stuff seems to be running a small amount faster, but not as much as I saw before. Odd. It is probably worthwhile to create a microbenchmark that exercises these pathways and try it out with that directly, instead of trying to infer something from our existing test execution speed.
          Hide
          Patrick Linskey added a comment -

          The change was trivial: I changed the 'sql' local variable in SelectImpl.execute() to be a member variable, and added a null check to control whether or not to go to the DBDictionary to create a new one.

          This depends on the original patch; the original patch lets us cache SelectImpls. Without that patch, the SQLBuffer would always be uninitialized anyways.

          I don't think that the lack of thread-safety in my prototype is an issue, since in the worst case, we just end up over-creating SQLBuffers. However, I am concerned about the fact that I'm ignoring the forUpdate and fetch parameters that are passed to execute(). I think that it might be ok to ignore fetch, since I think it's the same as the one that was used to create the SelectImpl, and is just being passed through for use by the DBDictionary. But I need to look at that more closely, and also do the same analysis for the forUpdate statement.

          But first I want to reproduce the numbers that I was seeing, to make sure that this is a worthwhile avenue.

          Show
          Patrick Linskey added a comment - The change was trivial: I changed the 'sql' local variable in SelectImpl.execute() to be a member variable, and added a null check to control whether or not to go to the DBDictionary to create a new one. This depends on the original patch; the original patch lets us cache SelectImpls. Without that patch, the SQLBuffer would always be uninitialized anyways. I don't think that the lack of thread-safety in my prototype is an issue, since in the worst case, we just end up over-creating SQLBuffers. However, I am concerned about the fact that I'm ignoring the forUpdate and fetch parameters that are passed to execute(). I think that it might be ok to ignore fetch, since I think it's the same as the one that was used to create the SelectImpl, and is just being passed through for use by the DBDictionary. But I need to look at that more closely, and also do the same analysis for the forUpdate statement. But first I want to reproduce the numbers that I was seeing, to make sure that this is a worthwhile avenue.
          Hide
          Kevin Sutter added a comment -

          Wow! That does sound significant. So, what were the SQLBuffer enhancements? And, would those changes be sufficient? Or, are they dependent on the other changes in your original patch?

          Kevin

          Show
          Kevin Sutter added a comment - Wow! That does sound significant. So, what were the SQLBuffer enhancements? And, would those changes be sufficient? Or, are they dependent on the other changes in your original patch? Kevin
          Hide
          Patrick Linskey added a comment -

          In case you're interested, my numbers are based on running the full OpenJPA test suite on the same hardware, with and without the changes. Without them (clean trunk build), the JDBC tests take around 7.5 minutes, and I just got a run in 5.5 minutes with the attached patch + my SQLBuffer prototype.

          This is a significant speedup, especially considering that our tests generally probably don't reuse statements as much as the average database application, since we generate a lot of different SQL and don't generally do the same thing too many times.

          Show
          Patrick Linskey added a comment - In case you're interested, my numbers are based on running the full OpenJPA test suite on the same hardware, with and without the changes. Without them (clean trunk build), the JDBC tests take around 7.5 minutes, and I just got a run in 5.5 minutes with the attached patch + my SQLBuffer prototype. This is a significant speedup, especially considering that our tests generally probably don't reuse statements as much as the average database application, since we generate a lot of different SQL and don't generally do the same thing too many times.
          Hide
          Patrick Linskey added a comment -

          I just prototyped caching SQLBuffers inside SelectImpl.execute(), and am now seeing relatively significant speedups. It's probably still worth looking into why things slowed down without that extra level of caching (there might be some more gains to be had there), but I'll explore that direction a bit more later this week.

          Show
          Patrick Linskey added a comment - I just prototyped caching SQLBuffers inside SelectImpl.execute(), and am now seeing relatively significant speedups. It's probably still worth looking into why things slowed down without that extra level of caching (there might be some more gains to be had there), but I'll explore that direction a bit more later this week.
          Hide
          Patrick Linskey added a comment -

          The Select interface and its implementations are responsible for both assembling and executing SELECT statements (and, for legacy reasons, some other statement types). So, it seemed as though reusing the Select instances was the right level of abstraction.

          My change actually isn't all that involved; it's mostly just a refactoring, and not much new code. The refactoring was necessary because a bunch of things involved in SQL generation examined a StateManager directly, and so to properly build a key for the Select, I needed to include all those bits of information. I could have actually left most of the classes alone and just made the key account for those bits of information, but I decided that that approach would be error-prone, both up-front (what if I missed some state manager interrogation?) and moving forward (what if the downstream methods started grabbing more data from the state manager?).

          Additionally, my change does not currently actually cache the SQLBuffers generated within the SelectImpls. It is possible that this is why there is no performance gain. The performance loss in our tests might be attributable to the extra overhead of always computing all the data needed for the SelectKey even if it's not needed in the end.

          Also, note that my changes right now only are used when loading a single instance, not when executing a Query. I expect that the techniques and implementation could be broadened to JPQL cases as well, but I didn't really look at that use case, since the problem description seemed to be identifying relation traversals and finds as the problems.

          > I was actually thinking of scoping the cache to an EntityManager (broker)
          > instance. Maybe not as useful as across all active brokers, but we would
          > still gain a sizeable benefit (hopefully).

          I think that the deciding factor for this will be whether or not there is any Broker-specific state that must be involved in the caches.

          Show
          Patrick Linskey added a comment - The Select interface and its implementations are responsible for both assembling and executing SELECT statements (and, for legacy reasons, some other statement types). So, it seemed as though reusing the Select instances was the right level of abstraction. My change actually isn't all that involved; it's mostly just a refactoring, and not much new code. The refactoring was necessary because a bunch of things involved in SQL generation examined a StateManager directly, and so to properly build a key for the Select, I needed to include all those bits of information. I could have actually left most of the classes alone and just made the key account for those bits of information, but I decided that that approach would be error-prone, both up-front (what if I missed some state manager interrogation?) and moving forward (what if the downstream methods started grabbing more data from the state manager?). Additionally, my change does not currently actually cache the SQLBuffers generated within the SelectImpls. It is possible that this is why there is no performance gain. The performance loss in our tests might be attributable to the extra overhead of always computing all the data needed for the SelectKey even if it's not needed in the end. Also, note that my changes right now only are used when loading a single instance, not when executing a Query. I expect that the techniques and implementation could be broadened to JPQL cases as well, but I didn't really look at that use case, since the problem description seemed to be identifying relation traversals and finds as the problems. > I was actually thinking of scoping the cache to an EntityManager (broker) > instance. Maybe not as useful as across all active brokers, but we would > still gain a sizeable benefit (hopefully). I think that the deciding factor for this will be whether or not there is any Broker-specific state that must be involved in the caches.
          Hide
          Kevin Sutter added a comment -

          Took a quick look at your patch. Much more involved than my initial experimentation. I was actually looking at it from a Query viewpoint and I was thinking of caching the generated sql along with the query compilation cache. That is, if a specific jpql query was cached after it was parsed and compiled, then couldn't we tag along the cached generated sql. As my previous notes on the dev mailing list indicated, we would still have to take into account the variable aspects of the query (parameters, hints, result size, etc). So, maybe going this route would have hit just as many parts. But, your patch was much more extensive than I was hoping for. Guess this shows how tight the sql generation and the "query processing" really is.

          Figuring out where to base the cache was also a question for me. Like you have found out, if you do it at too high of a level then we'll hit synchronization issues. If I put it at too low of a level, then it turned out to be useless. I was actually thinking of scoping the cache to an EntityManager (broker) instance. Maybe not as useful as across all active brokers, but we would still gain a sizeable benefit (hopefully).

          Those are my initial thoughts. To be honest, I haven't completely grasped your proposed changes, so I don't have any specific comments on your changes yet...

          Kevin

          Show
          Kevin Sutter added a comment - Took a quick look at your patch. Much more involved than my initial experimentation. I was actually looking at it from a Query viewpoint and I was thinking of caching the generated sql along with the query compilation cache. That is, if a specific jpql query was cached after it was parsed and compiled, then couldn't we tag along the cached generated sql. As my previous notes on the dev mailing list indicated, we would still have to take into account the variable aspects of the query (parameters, hints, result size, etc). So, maybe going this route would have hit just as many parts. But, your patch was much more extensive than I was hoping for. Guess this shows how tight the sql generation and the "query processing" really is. Figuring out where to base the cache was also a question for me. Like you have found out, if you do it at too high of a level then we'll hit synchronization issues. If I put it at too low of a level, then it turned out to be useless. I was actually thinking of scoping the cache to an EntityManager (broker) instance. Maybe not as useful as across all active brokers, but we would still gain a sizeable benefit (hopefully). Those are my initial thoughts. To be honest, I haven't completely grasped your proposed changes, so I don't have any specific comments on your changes yet... Kevin
          Hide
          Patrick Linskey added a comment -

          The attached patch reuses Select instances across all active Brokers from a given BrokerFactory. We need to do more careful analysis to make sure that Select instances are thread-safe once they've been created. Adding synchronization to the Select implementations will probably not help, since this would likely add a scalability bottleneck.

          Show
          Patrick Linskey added a comment - The attached patch reuses Select instances across all active Brokers from a given BrokerFactory. We need to do more careful analysis to make sure that Select instances are thread-safe once they've been created. Adding synchronization to the Select implementations will probably not help, since this would likely add a scalability bottleneck.
          Hide
          Patrick Linskey added a comment -

          This patch seems like it addresses some of the right areas to make improvements on this issue, at least for lookups by id (find() and lazy-loads). It builds a cache of Select instances keyed off of relevant data, and uses this cache in JDBCStoreManager to avoid recreating Select records.

          However, initial testing seems to indicate that there's actually a performance loss. Additionally, all the OpenJPA tests are passing, but a small number of the TCK tests fail with this change.

          Show
          Patrick Linskey added a comment - This patch seems like it addresses some of the right areas to make improvements on this issue, at least for lookups by id (find() and lazy-loads). It builds a cache of Select instances keyed off of relevant data, and uses this cache in JDBCStoreManager to avoid recreating Select records. However, initial testing seems to indicate that there's actually a performance loss. Additionally, all the OpenJPA tests are passing, but a small number of the TCK tests fail with this change.

            People

            • Assignee:
              Fay Wang
              Reporter:
              Patrick Linskey
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development