Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 2.0-beta4
    • Component/s: jackrabbit-core
    • Labels:
      None

      Description

      Jackrabbit should use database connection pools instead of a single connection per persistence manager, cluster journal, or database data store.

      1. 777490.patch
        85 kB
        Matej Knopp
      2. dbcp.patch
        8 kB
        Jukka Zitting
      3. dbcp.patch
        8 kB
        Jukka Zitting
      4. JCR-1456.patch
        268 kB
        Jukka Zitting
      5. JCR-1456-performance.txt
        6 kB
        Martijn Hendriks
      6. JCR-1456-performance-trunk-test-setup.patch
        44 kB
        Martijn Hendriks
      7. patch-1456-1.txt
        94 kB
        Matej Knopp
      8. patch-1456-2.txt
        237 kB
        Matej Knopp
      9. patch-1456-3.txt
        242 kB
        Matej Knopp

        Activity

        Transition Time In Source Status Execution Times Last Executer Last Execution Date
        Open Open Resolved Resolved
        638d 18h 31m 1 Jukka Zitting 04/Dec/09 10:41
        Resolved Resolved Closed Closed
        38d 3h 37m 1 Jukka Zitting 11/Jan/10 14:18
        Jukka Zitting made changes -
        Status Resolved [ 5 ] Closed [ 6 ]
        Jukka Zitting made changes -
        Status Open [ 1 ] Resolved [ 5 ]
        Fix Version/s 2.0-beta4 [ 12314412 ]
        Fix Version/s 2.0.0 [ 12312449 ]
        Resolution Fixed [ 1 ]
        Hide
        Jukka Zitting added a comment -

        Resolving this as Fixed for 2.0-beta4 since all the basic work is now in trunk and we should record this changes as having occurred in time for 2.0-beta4. Let's use separate issues for any remaining issues or improvements related to database connection pooling.

        Show
        Jukka Zitting added a comment - Resolving this as Fixed for 2.0-beta4 since all the basic work is now in trunk and we should record this changes as having occurred in time for 2.0-beta4. Let's use separate issues for any remaining issues or improvements related to database connection pooling.
        Hide
        Jukka Zitting added a comment -

        The pooled bundle persistence managers are now in o.a.j.core.persistence.pool and the original bundle PMs in o.a.j.core.persistence.bundle as before.

        With that change in place I've now merged all the changes from the JCR-1456 sandbox branch back to trunk. Further work on this issue should happen in trunk.

        Show
        Jukka Zitting added a comment - The pooled bundle persistence managers are now in o.a.j.core.persistence.pool and the original bundle PMs in o.a.j.core.persistence.bundle as before. With that change in place I've now merged all the changes from the JCR-1456 sandbox branch back to trunk. Further work on this issue should happen in trunk.
        Hide
        Stefan Guggisberg added a comment -

        > I'd keep the non-pooled stuff around just for persistence managers since they're the most critical part of the system.

        that's ok with me.

        Show
        Stefan Guggisberg added a comment - > I'd keep the non-pooled stuff around just for persistence managers since they're the most critical part of the system. that's ok with me.
        Hide
        Jukka Zitting added a comment -

        OK, let's go with the separate package. Should we do this just for the persistence managers, or all of the affected pieces (fs, data store, journal)? I'd keep the non-pooled stuff around just for persistence managers since they're the most critical part of the system.

        Show
        Jukka Zitting added a comment - OK, let's go with the separate package. Should we do this just for the persistence managers, or all of the affected pieces (fs, data store, journal)? I'd keep the non-pooled stuff around just for persistence managers since they're the most critical part of the system.
        Hide
        Martijn Hendriks added a comment -

        > are there any known issues?

        Besides the minor performance degradation there are none.

        Show
        Martijn Hendriks added a comment - > are there any known issues? Besides the minor performance degradation there are none.
        Hide
        Stefan Guggisberg added a comment -

        > Would you be fine with an option that made the connection pool contain just a single persistent connection (which would essentially match current functionality), or would you rather keep the current code as is and introduce the pool-enabled code in separate packages (see my earlier comment from 30/May/09)?

        personally i'd prefer the latter.

        Show
        Stefan Guggisberg added a comment - > Would you be fine with an option that made the connection pool contain just a single persistent connection (which would essentially match current functionality), or would you rather keep the current code as is and introduce the pool-enabled code in separate packages (see my earlier comment from 30/May/09)? personally i'd prefer the latter.
        Hide
        Jukka Zitting added a comment -

        > +1 for merging it to trunk if pooling would be an optional feature

        Would you be fine with an option that made the connection pool contain just a single persistent connection (which would essentially match current functionality), or would you rather keep the current code as is and introduce the pool-enabled code in separate packages (see my earlier comment from 30/May/09)?

        Show
        Jukka Zitting added a comment - > +1 for merging it to trunk if pooling would be an optional feature Would you be fine with an option that made the connection pool contain just a single persistent connection (which would essentially match current functionality), or would you rather keep the current code as is and introduce the pool-enabled code in separate packages (see my earlier comment from 30/May/09)?
        Hide
        Stefan Guggisberg added a comment -

        > It would mean changing the current pm implementations. So pooling wouldn't be optional.

        ok, assuming that there are no known issues at this time:

        -0 for merging it to trunk

        +1 for merging it to trunk if pooling would be an optional feature

        Show
        Stefan Guggisberg added a comment - > It would mean changing the current pm implementations. So pooling wouldn't be optional. ok, assuming that there are no known issues at this time: -0 for merging it to trunk +1 for merging it to trunk if pooling would be an optional feature
        Hide
        Thomas Mueller added a comment -

        It would mean changing the current pm implementations. So pooling wouldn't be optional.

        Show
        Thomas Mueller added a comment - It would mean changing the current pm implementations. So pooling wouldn't be optional.
        Hide
        Stefan Guggisberg added a comment -

        > Any objections to merging this work to Jackrabbit trunk? The branch looks pretty good to me now, and I think any remaining issues are best solved in trunk where it's easier for more people to try out and look at the code.

        are there any known issues?

        would merging to trunk mean changing the current pm implementions or would the connection pooling be an optional feature?

        Show
        Stefan Guggisberg added a comment - > Any objections to merging this work to Jackrabbit trunk? The branch looks pretty good to me now, and I think any remaining issues are best solved in trunk where it's easier for more people to try out and look at the code. are there any known issues? would merging to trunk mean changing the current pm implementions or would the connection pooling be an optional feature?
        Hide
        Thomas Mueller added a comment -

        +1 merge

        Show
        Thomas Mueller added a comment - +1 merge
        Hide
        Jukka Zitting added a comment -

        Any objections to merging this work to Jackrabbit trunk? The branch looks pretty good to me now, and I think any remaining issues are best solved in trunk where it's easier for more people to try out and look at the code.

        Show
        Jukka Zitting added a comment - Any objections to merging this work to Jackrabbit trunk? The branch looks pretty good to me now, and I think any remaining issues are best solved in trunk where it's easier for more people to try out and look at the code.
        Hide
        Martijn Hendriks added a comment -

        (Sorry for the late reply...)

        > Strange is that Connection.getAutoCommit() is called so much (maybe 50% of all JDBC method calls). Sometimes it is called 4 times in a row

        The commons-dbcp pool calls getAutoCommit on each borrow and on each return and ConnectionHelper.getConnection also calls it. That's three. I agree that the number of calls to getAutoCommit is very large: 28756 vs 759 on the trunk for the tests.

        > I just tested H2 embedded. I don't know why H2 got slower in your case

        I see the same: H2 embedded is just a couple of seconds slower. Using H2 in server mode over TCP (localhost), however, (using tracing or not) is significantly slower. This might have something to do with the large number of getAutoCommit calls....?

        > The AutoCommit stuff shouldn't be needed with connection pooling anymore, the AutoCommit mode should simply always be off.

        I don't think that changing the default for autoCommit changes the number of calls to getAutoCommit. Can we just keep this default or is there another reason to make the default "false"?

        Show
        Martijn Hendriks added a comment - (Sorry for the late reply...) > Strange is that Connection.getAutoCommit() is called so much (maybe 50% of all JDBC method calls). Sometimes it is called 4 times in a row The commons-dbcp pool calls getAutoCommit on each borrow and on each return and ConnectionHelper.getConnection also calls it. That's three. I agree that the number of calls to getAutoCommit is very large: 28756 vs 759 on the trunk for the tests. > I just tested H2 embedded. I don't know why H2 got slower in your case I see the same: H2 embedded is just a couple of seconds slower. Using H2 in server mode over TCP (localhost), however, (using tracing or not) is significantly slower. This might have something to do with the large number of getAutoCommit calls....? > The AutoCommit stuff shouldn't be needed with connection pooling anymore, the AutoCommit mode should simply always be off. I don't think that changing the default for autoCommit changes the number of calls to getAutoCommit. Can we just keep this default or is there another reason to make the default "false"?
        Hide
        Jukka Zitting added a comment -

        The AutoCommit stuff shouldn't be needed with connection pooling anymore, the AutoCommit mode should simply always be off.

        We use the AutoCommit mode to avoid having to add explicit commit() calls even after read-only operations. It's being switched on an off depending whether the repository is performing a read or a write operation. Now with the connection pool a connection is simply closed (or reclaimed to the pool) after a read operation ends, so no pending transaction state starts to accumulate on the database side.

        Show
        Jukka Zitting added a comment - The AutoCommit stuff shouldn't be needed with connection pooling anymore, the AutoCommit mode should simply always be off. We use the AutoCommit mode to avoid having to add explicit commit() calls even after read-only operations. It's being switched on an off depending whether the repository is performing a read or a write operation. Now with the connection pool a connection is simply closed (or reclaimed to the pool) after a read operation ends, so no pending transaction state starts to accumulate on the database side.
        Hide
        Thomas Mueller added a comment -

        Strange is that Connection.getAutoCommit() is called so much (maybe 50% of all JDBC method calls). Sometimes it is called 4 times in a row, without any other JDBC calls in between. It's not a problem for most databases (specially embedded), but I wonder why it is called so much and if this could be avoided.

        I just tested H2 embedded. I don't know why H2 got slower in your case, maybe because you set the trace level to the maximum, or because you have used the server mode (I used embedded and disabled the trace output).

        127 seconds with trunk
        131 seconds with JCR-1456
        Maven 2.0.9, 1.5.0_20, Mac OS 10.5.8

        Unrelated to JCR-1456: I had to disable the H2 shutdown hook because one of tests doesn't seem to close the repository correctly, so that Jackrabbit executes database statements in a shutdown hook. This only happens in the trunk, not in JCR-1456.

        Show
        Thomas Mueller added a comment - Strange is that Connection.getAutoCommit() is called so much (maybe 50% of all JDBC method calls). Sometimes it is called 4 times in a row, without any other JDBC calls in between. It's not a problem for most databases (specially embedded), but I wonder why it is called so much and if this could be avoided. I just tested H2 embedded. I don't know why H2 got slower in your case, maybe because you set the trace level to the maximum, or because you have used the server mode (I used embedded and disabled the trace output). 127 seconds with trunk 131 seconds with JCR-1456 Maven 2.0.9, 1.5.0_20, Mac OS 10.5.8 Unrelated to JCR-1456 : I had to disable the H2 shutdown hook because one of tests doesn't seem to close the repository correctly, so that Jackrabbit executes database statements in a shutdown hook. This only happens in the trunk, not in JCR-1456 .
        Hide
        Stefan Guggisberg added a comment -

        thanks for sharing the results. they do look very promising.
        i am pretty sure thomas has an idea how to explain/address
        to +50% on h2.

        Show
        Stefan Guggisberg added a comment - thanks for sharing the results. they do look very promising. i am pretty sure thomas has an idea how to explain/address to +50% on h2.
        Martijn Hendriks made changes -
        Attachment JCR-1456-performance.txt [ 12422994 ]
        Attachment JCR-1456-performance-trunk-test-setup.patch [ 12422995 ]
        Hide
        Martijn Hendriks added a comment -

        I attached the results of the performance test and also a patch against the trunk which I applied to setup the tests. There seems to be some overhead as a result of the patch. One case, however, shows quite dramatic performance loss (50%). I want to find out what causes this.

        Show
        Martijn Hendriks added a comment - I attached the results of the performance test and also a patch against the trunk which I applied to setup the tests. There seems to be some overhead as a result of the patch. One case, however, shows quite dramatic performance loss (50%). I want to find out what causes this.
        Hide
        Martijn Hendriks added a comment -

        Sure, I'll describe what I did as precisely as possible. All thanks for your feedback.

        Show
        Martijn Hendriks added a comment - Sure, I'll describe what I did as precisely as possible. All thanks for your feedback.
        Hide
        Thomas Mueller added a comment -

        > Is that enough

        That's enough, thanks.

        Could you post the test code / setup or describe what you tested? So that we can reproduce the results if needed. Just to protect from those who say "The new version feels slower..."

        Show
        Thomas Mueller added a comment - > Is that enough That's enough, thanks. Could you post the test code / setup or describe what you tested? So that we can reproduce the results if needed. Just to protect from those who say "The new version feels slower..."
        Hide
        Stefan Guggisberg added a comment -

        > [...] Is that enough, or should we do some more measurements?

        that would be fine with me, thanks.

        Show
        Stefan Guggisberg added a comment - > [...] Is that enough, or should we do some more measurements? that would be fine with me, thanks.
        Hide
        Martijn Hendriks added a comment -

        I think that the "testWhileIdle" approach for the DataSources managed by Jackrabbit resolves the performance issue (this is already present in the sandbox branch). I will try to get some more test results tomorrow. What I intend to do is measure the time that it takes to build the jackrabbit-core up to the integration-test phase on MySQL, MSSQL, H2 and Oracle backends. I compare the sandbox branch with a close revision in the trunk. I hope that these build-times are approximately the same. Is that enough, or should we do some more measurements?

        Show
        Martijn Hendriks added a comment - I think that the "testWhileIdle" approach for the DataSources managed by Jackrabbit resolves the performance issue (this is already present in the sandbox branch). I will try to get some more test results tomorrow. What I intend to do is measure the time that it takes to build the jackrabbit-core up to the integration-test phase on MySQL, MSSQL, H2 and Oracle backends. I compare the sandbox branch with a close revision in the trunk. I hope that these build-times are approximately the same. Is that enough, or should we do some more measurements?
        Hide
        Stefan Guggisberg added a comment -

        > Is the 20% slowdown problem solved? I think that should be solved before merging to trunk (disabling validation or doing validation on idle).

        i agree with thomas. i'd be okay to merging these changes to trunk if the performance issue has been resolved.

        Show
        Stefan Guggisberg added a comment - > Is the 20% slowdown problem solved? I think that should be solved before merging to trunk (disabling validation or doing validation on idle). i agree with thomas. i'd be okay to merging these changes to trunk if the performance issue has been resolved.
        Hide
        Thomas Mueller added a comment -

        Is the 20% slowdown problem solved? I think that should be solved before merging to trunk (disabling validation or doing validation on idle).

        P.S. JCR-1456.patch looks like a 'reverse patch'.

        Show
        Thomas Mueller added a comment - Is the 20% slowdown problem solved? I think that should be solved before merging to trunk (disabling validation or doing validation on idle). P.S. JCR-1456 .patch looks like a 'reverse patch'.
        Jukka Zitting made changes -
        Attachment JCR-1456.patch [ 12422801 ]
        Hide
        Jukka Zitting added a comment -

        Attached a patch showing the full set of changes between the JCR-1456 sandbox branch and the latest trunk.

        Anyone opposed to merging these changes to trunk? There's obviously still some work to be done, but I think the current state is already good enough to be included in the 2.0 beta releases.

        Show
        Jukka Zitting added a comment - Attached a patch showing the full set of changes between the JCR-1456 sandbox branch and the latest trunk. Anyone opposed to merging these changes to trunk? There's obviously still some work to be done, but I think the current state is already good enough to be included in the 2.0 beta releases.
        Hide
        Jukka Zitting added a comment -

        I gave a quick look at the current state in the sandbox branch, and I'm pretty happy with how this has turned out!

        Agreed about targeting this for Jackrabbit 2.0. We should start looking at merging the changes back to trunk so they'll go out in the 2.0 betas for more testing before the final 2.0 release.

        Show
        Jukka Zitting added a comment - I gave a quick look at the current state in the sandbox branch, and I'm pretty happy with how this has turned out! Agreed about targeting this for Jackrabbit 2.0. We should start looking at merging the changes back to trunk so they'll go out in the 2.0 betas for more testing before the final 2.0 release.
        Martijn Hendriks made changes -
        Fix Version/s 2.0.0 [ 12312449 ]
        Hide
        Martijn Hendriks added a comment -

        I think this should be fixed in 2.0.0.

        Show
        Martijn Hendriks added a comment - I think this should be fixed in 2.0.0.
        Hide
        Martijn Hendriks added a comment -

        I think that it is a good point to ask some feedback about the direction that the work on the sandbox branch is taking.

        What has been done:

        • A getDataSource method has been added to the ConnectionFactory which creates and returns a pooled datasource (commons-dbcp).
        • Maven profiles and some infrastructure have been added which make running the automated tests against different DB backends easier.
        • A DB uitility packages has been added: o.a.j.c.util.db. Classes from o.a.j.c.persistence.bundle.util have been moved there, and most importantly, it contains the ConnectionHelper class hierarchy. This hierarchy uses a DataSource provided by the Connectionfactory and provides a means to execute SQL and has specializations for several DB types. Most notably, the Oracle10R1ConnectionHelper implements special blob handling. It also contains a CheckSchemaOperation class which encapsulates the logic to check and create DB schemas (using a ConnectionHelper).
        • The bundle PMs and the DB Filesystem classes have been refactored to use the ConnectionHelper and CheckSchemaOperation classes. The PM and FS classes serve as factories for ConnectionHelper and CheckSchemaOperation instances.

        What must still be done:

        • Refactor remaining db based packages to use the ConnectionHelper and CheckSchemaOperation. (journal, datastore, and maybe the non-bundle pms).
        • Improve the implementation of Connectionfactory.getDataSource (now it creates a new DataSource for each invocation with default properties....)
        • ....
        • A lot of integration testing

        So what do you think about the current direction?

        Show
        Martijn Hendriks added a comment - I think that it is a good point to ask some feedback about the direction that the work on the sandbox branch is taking. What has been done: A getDataSource method has been added to the ConnectionFactory which creates and returns a pooled datasource (commons-dbcp). Maven profiles and some infrastructure have been added which make running the automated tests against different DB backends easier. A DB uitility packages has been added: o.a.j.c.util.db. Classes from o.a.j.c.persistence.bundle.util have been moved there, and most importantly, it contains the ConnectionHelper class hierarchy. This hierarchy uses a DataSource provided by the Connectionfactory and provides a means to execute SQL and has specializations for several DB types. Most notably, the Oracle10R1ConnectionHelper implements special blob handling. It also contains a CheckSchemaOperation class which encapsulates the logic to check and create DB schemas (using a ConnectionHelper). The bundle PMs and the DB Filesystem classes have been refactored to use the ConnectionHelper and CheckSchemaOperation classes. The PM and FS classes serve as factories for ConnectionHelper and CheckSchemaOperation instances. What must still be done: Refactor remaining db based packages to use the ConnectionHelper and CheckSchemaOperation. (journal, datastore, and maybe the non-bundle pms). Improve the implementation of Connectionfactory.getDataSource (now it creates a new DataSource for each invocation with default properties....) .... A lot of integration testing So what do you think about the current direction?
        Hide
        Martijn Hendriks added a comment -

        I've taken the liberty to take the next step: see revision 801659. I've replaced the DataSource field with a ConnectionHelper field, removed the Context from the method signatures and basically put the Context's state in the ConnectionHelper. I've also moved checkSchema and prepareSchemaObjectPrefix to the ConnectionHelper class. Database specific code is now contained in the ConnectionHelper class hierarchy and this hierarchy can be reused in other database dependent packages. The BundleDbPM's init method now calls a "createConnectionHelper" method which subclasses can override to use a specialized ConnectionHelper.

        At least the following builds succeed:

        • mvn clean integration-test (Derby and local FS backend)
        • mvn clean test -Puse-descriptor-overlay,mysql (MySQL backend, ignoring GCTest failure)
        • mvn clean test -Puse-descriptor-overlay,mssql (MSSQL backend, ignoring GCTest failure)

        Known open issues:

        • The H2, Oracle and Postgres PMs are broken (I've made it compile by commenting code out)
        • Make the pooling smarter (ConnectionFactory)
        • Add pooling to other DB based packages: db fs, db journal, db datastore, regular db pms.
        • blockOnConnectionLoss feature and retrying on failure strategy must be checked.
        Show
        Martijn Hendriks added a comment - I've taken the liberty to take the next step: see revision 801659. I've replaced the DataSource field with a ConnectionHelper field, removed the Context from the method signatures and basically put the Context's state in the ConnectionHelper. I've also moved checkSchema and prepareSchemaObjectPrefix to the ConnectionHelper class. Database specific code is now contained in the ConnectionHelper class hierarchy and this hierarchy can be reused in other database dependent packages. The BundleDbPM's init method now calls a "createConnectionHelper" method which subclasses can override to use a specialized ConnectionHelper. At least the following builds succeed: mvn clean integration-test (Derby and local FS backend) mvn clean test -Puse-descriptor-overlay,mysql (MySQL backend, ignoring GCTest failure) mvn clean test -Puse-descriptor-overlay,mssql (MSSQL backend, ignoring GCTest failure) Known open issues: The H2, Oracle and Postgres PMs are broken (I've made it compile by commenting code out) Make the pooling smarter (ConnectionFactory) Add pooling to other DB based packages: db fs, db journal, db datastore, regular db pms. blockOnConnectionLoss feature and retrying on failure strategy must be checked.
        Hide
        Martijn Hendriks added a comment -

        Committed in revision 800118. Using a different DB backend for testing can be done as follows:

        • Edit the relevant properties in the pom of the jackrabbit-core's use-descriptor-overlay profile.
        • Make sure you have the appropriate DB driver on the classpath (a MySQL driver is already there)
        • Run mvn clean integration-test -Puse-descriptor-overlay

        Note that the profile drops and recreates the test database in the clean phase.

        I recently have been looking at refactoring the database classes a bit to remove duplication in e.g., all these bean properties (username, password, schema object prefix, etc) and, more importantly, methods like checkSchema. The idea was to have a base class, say DbSupport, with all these common properties and methods and and with a method to get a sort of JDBC helper class which encapsulates the Connection and operations on it (something like the ConnectionRecoveryManager). I had various subclasses of the JDBC helper in mind for the various DB types (Oracle9, Derby). This works quite nicely for the core.fs.db package and connection pooling can then be located inside that JDBC helper class. I was wondering if that could help us here. What if the Operations that are mentioned above use such as JDBC helper class?

        Show
        Martijn Hendriks added a comment - Committed in revision 800118. Using a different DB backend for testing can be done as follows: Edit the relevant properties in the pom of the jackrabbit-core's use-descriptor-overlay profile. Make sure you have the appropriate DB driver on the classpath (a MySQL driver is already there) Run mvn clean integration-test -Puse-descriptor-overlay Note that the profile drops and recreates the test database in the clean phase. I recently have been looking at refactoring the database classes a bit to remove duplication in e.g., all these bean properties (username, password, schema object prefix, etc) and, more importantly, methods like checkSchema. The idea was to have a base class, say DbSupport, with all these common properties and methods and and with a method to get a sort of JDBC helper class which encapsulates the Connection and operations on it (something like the ConnectionRecoveryManager). I had various subclasses of the JDBC helper in mind for the various DB types (Oracle9, Derby). This works quite nicely for the core.fs.db package and connection pooling can then be located inside that JDBC helper class. I was wondering if that could help us here. What if the Operations that are mentioned above use such as JDBC helper class?
        Hide
        Stefan Guggisberg added a comment -

        > Shall I commit this to the sandbox branch which has been created for this issue?

        +1

        great, thanks!

        Show
        Stefan Guggisberg added a comment - > Shall I commit this to the sandbox branch which has been created for this issue? +1 great, thanks!
        Hide
        Martijn Hendriks added a comment -

        In order to make testing on other databases than Derby easier, I've created a new profile in jackrabbit-core and new configuration files in src/test/repository-filtered.
        The profile copies the test configuration files from src/test/repository-filtered and filters them against properties specified in the pom. The core tests can then be run, e.g., against a configuration which puts all data in a MySQL database.

        I think that this makes testing the core against other than the Derby backend easier. Shall I commit this to the sandbox branch which has been created for this issue?

        Show
        Martijn Hendriks added a comment - In order to make testing on other databases than Derby easier, I've created a new profile in jackrabbit-core and new configuration files in src/test/repository-filtered. The profile copies the test configuration files from src/test/repository-filtered and filters them against properties specified in the pom. The core tests can then be run, e.g., against a configuration which puts all data in a MySQL database. I think that this makes testing the core against other than the Derby backend easier. Shall I commit this to the sandbox branch which has been created for this issue?
        Hide
        Matej Knopp added a comment -

        I'm not sure what to do about the missing methods from DataSource. We can add dummy implementation to make it compile but that doesn't really solve the problem (if the real datasource implements methods from Wrapper interface).

        Show
        Matej Knopp added a comment - I'm not sure what to do about the missing methods from DataSource. We can add dummy implementation to make it compile but that doesn't really solve the problem (if the real datasource implements methods from Wrapper interface).
        Hide
        Jukka Zitting added a comment -

        I updated the sandbox branch to match the trunk and applied the latest patch.

        BTW, the branch now only compiles on Java 5 as the DataSource wrapper class doesn't work with Java 6.

        Show
        Jukka Zitting added a comment - I updated the sandbox branch to match the trunk and applied the latest patch. BTW, the branch now only compiles on Java 5 as the DataSource wrapper class doesn't work with Java 6.
        Jukka Zitting made changes -
        Workflow jira [ 12425161 ] no-reopen-closed, patch-avail [ 12467765 ]
        Hide
        Jukka Zitting added a comment -

        We're already working on a branch, so I'm not that worried about changing things. Let's see what we come up with and then consider whether the result should be merged back as-is or perhaps copied into a new package in trunk. We should probably also set up some extra integration tests that exercise all the database types that we can easily set up.

        Show
        Jukka Zitting added a comment - We're already working on a branch, so I'm not that worried about changing things. Let's see what we come up with and then consider whether the result should be merged back as-is or perhaps copied into a new package in trunk. We should probably also set up some extra integration tests that exercise all the database types that we can easily set up.
        Hide
        Thomas Mueller added a comment -

        To reduce the risk of problems, what about creating new classes instead of patching the existing classes? Like that you could concentrate on one database type first. It's just an idea...

        Show
        Thomas Mueller added a comment - To reduce the risk of problems, what about creating new classes instead of patching the existing classes? Like that you could concentrate on one database type first. It's just an idea...
        Hide
        Jukka Zitting added a comment -

        My concerns about too many changes were mostly about having the modifications go too much beyond the o.a.j.core.persistence package (and other database-related packages). What we do inside those packages is open for discussion, and I'd personally prefer to reach a clean design that's built with connection pooling in mind than to patch the current design to work with pooled connections.

        Anyway, I think the Context class is a good starting point, and we can continue by refactoring until the design is better.

        Fair point about the one big change. If you like you can commit the full patch as-is and we can work from there in svn.

        Show
        Jukka Zitting added a comment - My concerns about too many changes were mostly about having the modifications go too much beyond the o.a.j.core.persistence package (and other database-related packages). What we do inside those packages is open for discussion, and I'd personally prefer to reach a clean design that's built with connection pooling in mind than to patch the current design to work with pooled connections. Anyway, I think the Context class is a good starting point, and we can continue by refactoring until the design is better. Fair point about the one big change. If you like you can commit the full patch as-is and we can work from there in svn.
        Hide
        Matej Knopp added a comment -

        Thanks for the feedback.

        Unfortunately most of the patch is one big change - it modifies AbstractBundlePersistenceManager so it requires all it's subclasses to be adapted.

        The context class was IMHO probably the easiest way to introduce connection pooling without requiring complete refactor/rewrite of persistence managers. Yet the patch is quite big.

        I like the idea of operations. This would however be far bigger change that what I did. I thought the idea was to introduce connection pooling with minimal fuzz. Looks like I was wrong. I will look into this. I agree that if done properly this would be much cleaner solution than passing context object around.

        Show
        Matej Knopp added a comment - Thanks for the feedback. Unfortunately most of the patch is one big change - it modifies AbstractBundlePersistenceManager so it requires all it's subclasses to be adapted. The context class was IMHO probably the easiest way to introduce connection pooling without requiring complete refactor/rewrite of persistence managers. Yet the patch is quite big. I like the idea of operations. This would however be far bigger change that what I did. I thought the idea was to introduce connection pooling with minimal fuzz. Looks like I was wrong. I will look into this. I agree that if done properly this would be much cleaner solution than passing context object around.
        Hide
        Jukka Zitting added a comment -

        Reviewing a 2000+ line patch isn't too easy. Could you split it to smaller pieces? Also, feel free to commit the incremental changes directly to the branch in the sandbox. That way we can better label, comment and discuss each step separately instead of syncing up only on aggregate patches.

        On the proposed changes: Good stuff, thanks! We're definitely seeing good progress here.

        The main concern I have is about the Context concept you're introducing. I see where you're coming, but I think there's a better way to do this. The "context" of a method call is the object on which the method is called. How about, instead of passing the Context objects around, we actually moved the recipient methods into the Context class?

        The Context class would then become something like a generic DatabaseOperation base class that encapsulates the database Connection being used for that operation. Subclasses like LoadBundleOperation, SaveChangesOperation or CheckSchemaOperation could extend this base class with specific functionality that we currently have inside the PersistenceManager (and other) classes. Database-specific extensions can be handled as yet another subclasses like OracleCheckSchemaOperation and the PersistenceManager classes would simply act as factories of these Operation instances instead of actually implementing the database functionality.

        WDYT?

        Show
        Jukka Zitting added a comment - Reviewing a 2000+ line patch isn't too easy. Could you split it to smaller pieces? Also, feel free to commit the incremental changes directly to the branch in the sandbox. That way we can better label, comment and discuss each step separately instead of syncing up only on aggregate patches. On the proposed changes: Good stuff, thanks! We're definitely seeing good progress here. The main concern I have is about the Context concept you're introducing. I see where you're coming, but I think there's a better way to do this. The "context" of a method call is the object on which the method is called. How about, instead of passing the Context objects around, we actually moved the recipient methods into the Context class? The Context class would then become something like a generic DatabaseOperation base class that encapsulates the database Connection being used for that operation. Subclasses like LoadBundleOperation, SaveChangesOperation or CheckSchemaOperation could extend this base class with specific functionality that we currently have inside the PersistenceManager (and other) classes. Database-specific extensions can be handled as yet another subclasses like OracleCheckSchemaOperation and the PersistenceManager classes would simply act as factories of these Operation instances instead of actually implementing the database functionality. WDYT?
        Matej Knopp made changes -
        Attachment 777490.patch [ 12409296 ]
        Matej Knopp made changes -
        Attachment 777490.patch [ 12409295 ]
        Matej Knopp made changes -
        Attachment 777490.patch [ 12409295 ]
        Hide
        Matej Knopp added a comment -

        Initial version of new patch. Probably lacks lot of polish. Would be nice if someone reviewed the patch and provided feedback.

        Show
        Matej Knopp added a comment - Initial version of new patch. Probably lacks lot of polish. Would be nice if someone reviewed the patch and provided feedback.
        Hide
        Jukka Zitting added a comment -

        > You create new BasicDataSource every time getDriverDataSource() is called.

        That's by design. As noted in my previous comment, I think we should replace ConnectionFactory.getConnection() with ConnectionFactory.getDataSource() and store a reference to the returned DataSource in the persistence manager.

        > I think there should only be one datasource instance per driverclass/url combo.

        Eventually yes. There the extra configuration parts that you proposed earlier will come in handy. However I think it's more straightforward if we start with one DataSource per persistence manager for now. Doing it this way we can keep the changes nicely localized within a single persistence manager. We can change the configuration mechanism or introduce some repository-local DataSource registry later on once the basic pooling functionality is there.

        Show
        Jukka Zitting added a comment - > You create new BasicDataSource every time getDriverDataSource() is called. That's by design. As noted in my previous comment, I think we should replace ConnectionFactory.getConnection() with ConnectionFactory.getDataSource() and store a reference to the returned DataSource in the persistence manager. > I think there should only be one datasource instance per driverclass/url combo. Eventually yes. There the extra configuration parts that you proposed earlier will come in handy. However I think it's more straightforward if we start with one DataSource per persistence manager for now. Doing it this way we can keep the changes nicely localized within a single persistence manager. We can change the configuration mechanism or introduce some repository-local DataSource registry later on once the basic pooling functionality is there.
        Hide
        Matej Knopp added a comment -

        There is one problem with your patch that I overlooked. You create new BasicDataSource every time getDriverDataSource() is called. I think there should only be one datasource instance per driverclass/url combo. Otherwise it just keeps creating pools.

        I can fix this easily but it will take some time. Right now I'm in process of getting BundleDbPersistenceManager and subclasses used to borrowing connections instead of relying on one shared always being available.

        Show
        Matej Knopp added a comment - There is one problem with your patch that I overlooked. You create new BasicDataSource every time getDriverDataSource() is called. I think there should only be one datasource instance per driverclass/url combo. Otherwise it just keeps creating pools. I can fix this easily but it will take some time. Right now I'm in process of getting BundleDbPersistenceManager and subclasses used to borrowing connections instead of relying on one shared always being available.
        Hide
        Jukka Zitting added a comment -

        OK, thanks. I applied the patch in revision 778741. As a next step I think we should change the ConnectionFactory to return the configured DataSource instead of a Connection object. This way we can push the DataSource reference all the way up to the PersistenceManager implementations and use it to get Connections only on demand.

        Show
        Jukka Zitting added a comment - OK, thanks. I applied the patch in revision 778741. As a next step I think we should change the ConnectionFactory to return the configured DataSource instead of a Connection object. This way we can push the DataSource reference all the way up to the PersistenceManager implementations and use it to get Connections only on demand.
        Hide
        Matej Knopp added a comment -

        I like it. I think it's a good way to start.

        Show
        Matej Knopp added a comment - I like it. I think it's a good way to start.
        Hide
        Jukka Zitting added a comment -

        What do you think about my dbcp.patch? Should I commit it to the feature branch as a starting point?

        Show
        Jukka Zitting added a comment - What do you think about my dbcp.patch? Should I commit it to the feature branch as a starting point?
        Jukka Zitting made changes -
        Attachment dbcp.patch [ 12408243 ]
        Hide
        Jukka Zitting added a comment -

        Minor update (protect against a null driver class) to dbcp.patch.

        Show
        Jukka Zitting added a comment - Minor update (protect against a null driver class) to dbcp.patch.
        Jukka Zitting made changes -
        Attachment dbcp.patch [ 12408242 ]
        Hide
        Jukka Zitting added a comment -

        > The problem here is that right now every component (persistence manager, fs,
        > journal, ...) in jackrabbit is responsible for creating the database connection
        > thus the connection properties are specified as configuration option for the
        > component. But connection pooling essentially takes this responsibility from
        > component. So how should the database properties be handled? Where
        > should the connection pool be configured?

        Right where we currently configure the single database connection per component.

        > If you have an idea how to configure connection pools and configure the
        > components to use those pools while preserving current configuration
        > scheme I'd love to know about it.

        See the attached patch (dbcp.patch) for a simple change that makes all the connections we currently create to come from connection pools. This change obviously doesn't solve the main issue, but should illustrate how I envision us to handle existing database configurations.

        > There is one way I can imagine connection pooling working without change
        > to configuration syntax. While parsing the configuration jackrabbit would create
        > datasource (with default connection pool i.e. dbcp) for each distinct connection
        > properties and then pass this datasource to the component to which the
        > connection properties belong.

        Exactly!

        Show
        Jukka Zitting added a comment - > The problem here is that right now every component (persistence manager, fs, > journal, ...) in jackrabbit is responsible for creating the database connection > thus the connection properties are specified as configuration option for the > component. But connection pooling essentially takes this responsibility from > component. So how should the database properties be handled? Where > should the connection pool be configured? Right where we currently configure the single database connection per component. > If you have an idea how to configure connection pools and configure the > components to use those pools while preserving current configuration > scheme I'd love to know about it. See the attached patch (dbcp.patch) for a simple change that makes all the connections we currently create to come from connection pools. This change obviously doesn't solve the main issue, but should illustrate how I envision us to handle existing database configurations. > There is one way I can imagine connection pooling working without change > to configuration syntax. While parsing the configuration jackrabbit would create > datasource (with default connection pool i.e. dbcp) for each distinct connection > properties and then pass this datasource to the component to which the > connection properties belong. Exactly!
        Hide
        Matej Knopp added a comment -

        > We can always change the superclass.

        Well, the superclass methods could pass a context object around that the subclass could use to store connection.

        > Not if it's a member of an extra object that's instantiated per each top level method call.

        Yes, but then the superclass needs to pass this object as method argument. Which is probably okay.

        > For now I'm mostly worried about the patch being more complex than it needs to be. We can add config entries once the basic functionality is in place.

        The problem here is that right now every component (persistence manager, fs, journal, ...) in jackrabbit is responsible for creating the database connection thus the connection properties are specified as configuration option for the component. But connection pooling essentially takes this responsibility from component. So how should the database properties be handled? Where should the connection pool be configured?

        My previous patch solved this in way that allowed propagation of connection properties from components to connection pool and didn't require any changes to configuration files but that came at cost of readability and complexity. Current way of configuring database connections in jackrabbit assumes the connection is created by the component but this is really not compatible with connection pools.

        The cleanest solution would be to add section in configuration to define data sources and then way to assign a data source to each component that needs database connection. This obviously requires changing the configuration scheme though.

        If you have an idea how to configure connection pools and configure the components to use those pools while preserving current configuration scheme I'd love to know about it. The approach I tried worked but it resulted in code apparently too complicated to be committed.

        There is one way I can imagine connection pooling working without change to configuration syntax. While parsing the configuration jackrabbit would create datasource (with default connection pool i.e. dbcp) for each distinct connection properties and then pass this datasource to the component to which the connection properties belong.

        This would make the configuration parsing code bit more complicated but wouldn't require any change to configuration files.

        > So we don't need to implement one. Note that the solution should work with existing Jackrabbit configurations that do not specify a connection pool. There's no need for DBCP when a JNDI DataSource is configured, but it makes things a lot easier for non-JNDI configurations.

        I agree that jackrabbit should have connection pooling configured by default but that should be overridable in configuration file.

        > What's SimplePoolingConnectionProvider for then?

        That's just a simple connection pool i used for development. It was meant to be kind of "Default" connection provider and the connection pooling logic should be replaced with DBCP should people have agried with the approach i taked with the patch.

        Show
        Matej Knopp added a comment - > We can always change the superclass. Well, the superclass methods could pass a context object around that the subclass could use to store connection. > Not if it's a member of an extra object that's instantiated per each top level method call. Yes, but then the superclass needs to pass this object as method argument. Which is probably okay. > For now I'm mostly worried about the patch being more complex than it needs to be. We can add config entries once the basic functionality is in place. The problem here is that right now every component (persistence manager, fs, journal, ...) in jackrabbit is responsible for creating the database connection thus the connection properties are specified as configuration option for the component. But connection pooling essentially takes this responsibility from component. So how should the database properties be handled? Where should the connection pool be configured? My previous patch solved this in way that allowed propagation of connection properties from components to connection pool and didn't require any changes to configuration files but that came at cost of readability and complexity. Current way of configuring database connections in jackrabbit assumes the connection is created by the component but this is really not compatible with connection pools. The cleanest solution would be to add section in configuration to define data sources and then way to assign a data source to each component that needs database connection. This obviously requires changing the configuration scheme though. If you have an idea how to configure connection pools and configure the components to use those pools while preserving current configuration scheme I'd love to know about it. The approach I tried worked but it resulted in code apparently too complicated to be committed. There is one way I can imagine connection pooling working without change to configuration syntax. While parsing the configuration jackrabbit would create datasource (with default connection pool i.e. dbcp) for each distinct connection properties and then pass this datasource to the component to which the connection properties belong. This would make the configuration parsing code bit more complicated but wouldn't require any change to configuration files. > So we don't need to implement one. Note that the solution should work with existing Jackrabbit configurations that do not specify a connection pool. There's no need for DBCP when a JNDI DataSource is configured, but it makes things a lot easier for non-JNDI configurations. I agree that jackrabbit should have connection pooling configured by default but that should be overridable in configuration file. > What's SimplePoolingConnectionProvider for then? That's just a simple connection pool i used for development. It was meant to be kind of "Default" connection provider and the connection pooling logic should be replaced with DBCP should people have agried with the approach i taked with the patch.
        Hide
        Jukka Zitting added a comment -

        > The problem here is that BundleDbPersistenceManager superclass is connection
        > agnostic so passing the database connection as argument doesn't seem to be an option.

        We can always change the superclass.

        > Storing that as member variable is, but it requires additional locking.

        Not if it's a member of an extra object that's instantiated per each top level method call.

        > As far as I can remember all introduced configuration entities were completely optional
        > with sane default making the change pretty much transparent for user.

        For now I'm mostly worried about the patch being more complex than it needs to be. We can add config entries once the basic functionality is in place.

        > What's the point of hardwiring concrete connection pool?

        So we don't need to implement one. Note that the solution should work with existing Jackrabbit configurations that do not specify a connection pool. There's no need for DBCP when a JNDI DataSource is configured, but it makes things a lot easier for non-JNDI configurations.

        > In my previous patch I didn't attempt to do any connection pooling.

        What's SimplePoolingConnectionProvider for then?

        > The reason why I chose not to use DataSource is because DataSource has no means to
        > pass connection properties (url, driver, etc).

        That's what we'd use Commons DBCP for (in cases when a JNDI DataSource has not already been configured).

        > The custom sandbox could work for me.

        OK, good. I created such a development branch in https://svn.apache.org/repos/asf/jackrabbit/sandbox/JCR-1456.

        Show
        Jukka Zitting added a comment - > The problem here is that BundleDbPersistenceManager superclass is connection > agnostic so passing the database connection as argument doesn't seem to be an option. We can always change the superclass. > Storing that as member variable is, but it requires additional locking. Not if it's a member of an extra object that's instantiated per each top level method call. > As far as I can remember all introduced configuration entities were completely optional > with sane default making the change pretty much transparent for user. For now I'm mostly worried about the patch being more complex than it needs to be. We can add config entries once the basic functionality is in place. > What's the point of hardwiring concrete connection pool? So we don't need to implement one. Note that the solution should work with existing Jackrabbit configurations that do not specify a connection pool. There's no need for DBCP when a JNDI DataSource is configured, but it makes things a lot easier for non-JNDI configurations. > In my previous patch I didn't attempt to do any connection pooling. What's SimplePoolingConnectionProvider for then? > The reason why I chose not to use DataSource is because DataSource has no means to > pass connection properties (url, driver, etc). That's what we'd use Commons DBCP for (in cases when a JNDI DataSource has not already been configured). > The custom sandbox could work for me. OK, good. I created such a development branch in https://svn.apache.org/repos/asf/jackrabbit/sandbox/JCR-1456 .
        Hide
        Matej Knopp added a comment -

        >Using ThreadLocal for the connection seems unnecessarily complex (from the perspective of someone new trying to understand the code), it's better to pass the connection around as a method argument or encapsulate it as a member variable of an object that performs the database operations

        The problem here is that BundleDbPersistenceManager superclass is connection agnostic so passing the database connection as argument doesn't seem to be an option. Storing that as member variable is, but it requires additional locking. To reduce unnecessary locking I decided to use thread locals. Since it's perceived to be too complicated there won't be any in next patch.

        > Could you implement this without introducing new configuration entries? We may consider adding that later, but it would be clearer if we first implemented connection pooling with the access configuration that we currently have.

        As far as I can remember all introduced configuration entities were completely optional with sane default making the change pretty much transparent for user.

        > We should leverage something like Commons DBCP instead of implementing our own connection pooling logic. Commons DBCP is much better than anything that we could come up with.

        What's the point of hardwiring concrete connection pool? In my previous patch I didn't attempt to do any connection pooling. I just had a connection source which could be easily implemented to get the connection from any connection pool or JNDI. If you insist though on using commons dbcp I can live with that though.

        > Related to the above, we should use the standard DataSource interface interface instead of a custom ConnectionManager class. This would nicely abstract away all the pooling logic and make the code much more familiar to people who already know JDBC. All top-level methods would look like something like this

        The connection manager in patch didn't create database connections. It delegated that to a connection provider. The purpose of ConnectionManager were some convenience methods and prepared statements caching. The connection provider was simple DataSource like interface. The reason why I chose not to use DataSource is because DataSource has no means to pass connection properties (url, driver, etc). That normally isn't a problem, in jackrabbit however it is, because the database properties are specified on components level (persistence manager, file system, journal, etc).

        If PersistenceManager is supposed to have connection properties (which you seem to insist on in order not to change configuration) then it needs a way to pass these information to code that manages database connections. And DataSource doesn't provide any means for it.

        The custom sandbox could work for me.

        Thanks for your input, it's very appreciated.

        Show
        Matej Knopp added a comment - >Using ThreadLocal for the connection seems unnecessarily complex (from the perspective of someone new trying to understand the code), it's better to pass the connection around as a method argument or encapsulate it as a member variable of an object that performs the database operations The problem here is that BundleDbPersistenceManager superclass is connection agnostic so passing the database connection as argument doesn't seem to be an option. Storing that as member variable is, but it requires additional locking. To reduce unnecessary locking I decided to use thread locals. Since it's perceived to be too complicated there won't be any in next patch. > Could you implement this without introducing new configuration entries? We may consider adding that later, but it would be clearer if we first implemented connection pooling with the access configuration that we currently have. As far as I can remember all introduced configuration entities were completely optional with sane default making the change pretty much transparent for user. > We should leverage something like Commons DBCP instead of implementing our own connection pooling logic. Commons DBCP is much better than anything that we could come up with. What's the point of hardwiring concrete connection pool? In my previous patch I didn't attempt to do any connection pooling. I just had a connection source which could be easily implemented to get the connection from any connection pool or JNDI. If you insist though on using commons dbcp I can live with that though. > Related to the above, we should use the standard DataSource interface interface instead of a custom ConnectionManager class. This would nicely abstract away all the pooling logic and make the code much more familiar to people who already know JDBC. All top-level methods would look like something like this The connection manager in patch didn't create database connections. It delegated that to a connection provider. The purpose of ConnectionManager were some convenience methods and prepared statements caching. The connection provider was simple DataSource like interface. The reason why I chose not to use DataSource is because DataSource has no means to pass connection properties (url, driver, etc). That normally isn't a problem, in jackrabbit however it is, because the database properties are specified on components level (persistence manager, file system, journal, etc). If PersistenceManager is supposed to have connection properties (which you seem to insist on in order not to change configuration) then it needs a way to pass these information to code that manages database connections. And DataSource doesn't provide any means for it. The custom sandbox could work for me. Thanks for your input, it's very appreciated.
        Hide
        Jukka Zitting added a comment -

        I think the main concerns raised above are:

        • The entire store(ChangeLog) operation needs to happen atomically
        • Using ThreadLocal for the connection seems unnecessarily complex (from the perspective of someone new trying to understand the code), it's better to pass the connection around as a method argument or encapsulate it as a member variable of an object that performs the database operations

        I also have some extra concerns:

        • Could you implement this without introducing new configuration entries? We may consider adding that later, but it would be clearer if we first implemented connection pooling with the access configuration that we currently have.
        • We should leverage something like Commons DBCP instead of implementing our own connection pooling logic. Commons DBCP is much better than anything that we could come up with.
        • Related to the above, we should use the standard DataSource interface interface instead of a custom ConnectionManager class. This would nicely abstract away all the pooling logic and make the code much more familiar to people who already know JDBC. All top-level methods would look like something like this:

        public void doSomething() throws SQLException {
        Connection connection = dataSource.getConnection();
        try

        { // do something with the connection }

        finally

        { connection.close(); }

        }

        The above are of course just individual opinions. Feel free to argue otherwise if you have a better solution.

        PS. The Jackrabbit sandbox is nowadays open to all Apache committers, so if you may want to create a development branch of Jackrabbit trunk (or the 1.x branch) in the sandbox for this work. The changes here are so extensive that it may be easier for us to work incrementally through svn.

        PPS. Alternatively, if you know Git, you may want to clone git://git.apache.org/jackrabbit.git and publish your changes for example on Github.

        Show
        Jukka Zitting added a comment - I think the main concerns raised above are: The entire store(ChangeLog) operation needs to happen atomically Using ThreadLocal for the connection seems unnecessarily complex (from the perspective of someone new trying to understand the code), it's better to pass the connection around as a method argument or encapsulate it as a member variable of an object that performs the database operations I also have some extra concerns: Could you implement this without introducing new configuration entries? We may consider adding that later, but it would be clearer if we first implemented connection pooling with the access configuration that we currently have. We should leverage something like Commons DBCP instead of implementing our own connection pooling logic. Commons DBCP is much better than anything that we could come up with. Related to the above, we should use the standard DataSource interface interface instead of a custom ConnectionManager class. This would nicely abstract away all the pooling logic and make the code much more familiar to people who already know JDBC. All top-level methods would look like something like this: public void doSomething() throws SQLException { Connection connection = dataSource.getConnection(); try { // do something with the connection } finally { connection.close(); } } The above are of course just individual opinions. Feel free to argue otherwise if you have a better solution. PS. The Jackrabbit sandbox is nowadays open to all Apache committers, so if you may want to create a development branch of Jackrabbit trunk (or the 1.x branch) in the sandbox for this work. The changes here are so extensive that it may be easier for us to work incrementally through svn. PPS. Alternatively, if you know Git, you may want to clone git://git.apache.org/jackrabbit.git and publish your changes for example on Github.
        Hide
        Matej Knopp added a comment -

        What are the requirements for the patch? I'm willing to provide one against current trunk but I have to know upfront what's expected of it so that it doesn't end like the previous attempt.

        Show
        Matej Knopp added a comment - What are the requirements for the patch? I'm willing to provide one against current trunk but I have to know upfront what's expected of it so that it doesn't end like the previous attempt.
        Hide
        Brian Topping added a comment -

        There's been two separate revisions of the original patch and it was never applied. What guarantee does the community or a contributor have that this patch would be applied if it were reworked to be current?

        Show
        Brian Topping added a comment - There's been two separate revisions of the original patch and it was never applied. What guarantee does the community or a contributor have that this patch would be applied if it were reworked to be current?
        Hide
        Jukka Zitting added a comment -

        As far as I know, nobody is actively working on this. We'll include this in 1.6 if someone comes up with a patch that everyone agrees on by that time.

        Show
        Jukka Zitting added a comment - As far as I know, nobody is actively working on this. We'll include this in 1.6 if someone comes up with a patch that everyone agrees on by that time.
        Hide
        Sunil D added a comment -

        Just wanted to ask what the status of this ticket is. I see it's unassigned and there's no fix version right now; is it still planned for 1.6?

        Show
        Sunil D added a comment - Just wanted to ask what the status of this ticket is. I see it's unassigned and there's no fix version right now; is it still planned for 1.6?
        Jukka Zitting made changes -
        Fix Version/s 1.5 [ 12312920 ]
        Hide
        Jukka Zitting added a comment -

        Let's postpone this to 1.6.

        Show
        Jukka Zitting added a comment - Let's postpone this to 1.6.
        Hide
        Thomas Mueller added a comment -

        Hi Dave,

        The connection properties you described are unrelated to "Database connection pooling", right?
        If yes then I suggest to open another issue.

        Regards,
        Thomas

        Show
        Thomas Mueller added a comment - Hi Dave, The connection properties you described are unrelated to "Database connection pooling", right? If yes then I suggest to open another issue. Regards, Thomas
        Hide
        Dave Brosius added a comment -

        I like the idea of this patch, but i think the ConnectionProperties is too specific. It should just be

        private String url;
        private String driver;
        private Properties connectionProperties.

        for instance, i would like to add

        properties.put("oracle.net.ssl_cipher_suites", "(SSL_DH_anon_WITH_3DES_EDE_CBC_SHA, SSL_DH_anon_WITH_RC4_128_MD5, SSL_DH_anon_WITH_DES_CBC_SHA)";

        so that i can connect to the jackrabbit database over SSL.

        And of course a similar implication for the repository.xml, to include arbitrary connection properties.

        Show
        Dave Brosius added a comment - I like the idea of this patch, but i think the ConnectionProperties is too specific. It should just be private String url; private String driver; private Properties connectionProperties. for instance, i would like to add properties.put("oracle.net.ssl_cipher_suites", "(SSL_DH_anon_WITH_3DES_EDE_CBC_SHA, SSL_DH_anon_WITH_RC4_128_MD5, SSL_DH_anon_WITH_DES_CBC_SHA)"; so that i can connect to the jackrabbit database over SSL. And of course a similar implication for the repository.xml, to include arbitrary connection properties.
        Hide
        Thomas Mueller added a comment -

        Thanks for the patch!

        > It's really not complicated at all. There is one class (ThreadLocalConnectionProviderAdapter...

        This class does look complicated to me. To avoid ThreadLocal, what about:

        BundleDbPersistenceManager {
        Connection currentConnection
        synchronized store(..) {
        try

        { currentConnection = ... super.store(..) }

        finally

        { currentConnection = null }

        }

        > the patch hasn't been heavily tested in "real world"

        I have already said, manual tests and real world tests are a maintenance problem. If there is no automated test, each change is big risk. Maintaining and improving the code is very hard in this case.

        Show
        Thomas Mueller added a comment - Thanks for the patch! > It's really not complicated at all. There is one class (ThreadLocalConnectionProviderAdapter... This class does look complicated to me. To avoid ThreadLocal, what about: BundleDbPersistenceManager { Connection currentConnection synchronized store(..) { try { currentConnection = ... super.store(..) } finally { currentConnection = null } } > the patch hasn't been heavily tested in "real world" I have already said, manual tests and real world tests are a maintenance problem. If there is no automated test, each change is big risk. Maintaining and improving the code is very hard in this case.
        Matej Knopp made changes -
        Attachment patch-1456-3.txt [ 12388940 ]
        Hide
        Matej Knopp added a comment -

        Patch for current trunk (rev 672286)

        Show
        Matej Knopp added a comment - Patch for current trunk (rev 672286)
        Hide
        Matej Knopp added a comment -

        > What about doing that for DbDataStore first? The patch would be much smaller.

        Well, the file system is also instantiated per workspace so I believe connection pooling makes sense there. But it can be excluded from the patch, that shouldn't be a big issue.
        Same goes for DataStore. But while you can live without Db file system, DbDateStore is more or less necessary for clustered environments and the lack of connection pooling can be a serious issue there.

        > That sounds too complicated, too tricky, and too slow for me. For store(ChangeLog), why not simply pass the connection object to the nested calls?

        Because the nested calls are invoked from BundleDbPersistenceManager which doesn't know about database connection.

        It's really not complicated at all. There is one class (ThreadLocalConnectionProviderAdapter) that makes sure that getConnection() returns same connection for "nested" calls.
        Also, could you please be more specific about what exactly sounds too slow about this?

        > Just to make sure: You mean automated tests, right? Manual tests is a maintenance problem.
        Well, all unit tests that work with "vanilla" jackrabbit also work after the patch is applied. However the patch hasn't been heavily tested in "real world" environment or with different database backends.

        Show
        Matej Knopp added a comment - > What about doing that for DbDataStore first? The patch would be much smaller. Well, the file system is also instantiated per workspace so I believe connection pooling makes sense there. But it can be excluded from the patch, that shouldn't be a big issue. Same goes for DataStore. But while you can live without Db file system, DbDateStore is more or less necessary for clustered environments and the lack of connection pooling can be a serious issue there. > That sounds too complicated, too tricky, and too slow for me. For store(ChangeLog), why not simply pass the connection object to the nested calls? Because the nested calls are invoked from BundleDbPersistenceManager which doesn't know about database connection. It's really not complicated at all. There is one class (ThreadLocalConnectionProviderAdapter) that makes sure that getConnection() returns same connection for "nested" calls. Also, could you please be more specific about what exactly sounds too slow about this? > Just to make sure: You mean automated tests, right? Manual tests is a maintenance problem. Well, all unit tests that work with "vanilla" jackrabbit also work after the patch is applied. However the patch hasn't been heavily tested in "real world" environment or with different database backends.
        Hide
        Thomas Mueller added a comment -

        What about doing that for DbDataStore first? The patch would be much smaller.

        > attaching active connection to current thread, so that the nested calls would always obtain the active connection

        That sounds too complicated, too tricky, and too slow for me. For store(ChangeLog), why not simply pass the connection object to the nested calls?

        > require lot of testing

        Just to make sure: You mean automated tests, right? Manual tests is a maintenance problem.

        Show
        Thomas Mueller added a comment - What about doing that for DbDataStore first? The patch would be much smaller. > attaching active connection to current thread, so that the nested calls would always obtain the active connection That sounds too complicated, too tricky, and too slow for me. For store(ChangeLog), why not simply pass the connection object to the nested calls? > require lot of testing Just to make sure: You mean automated tests, right? Manual tests is a maintenance problem.
        Matej Knopp made changes -
        Attachment patch-1456-2.txt [ 12387902 ]
        Hide
        Matej Knopp added a comment - - edited

        Added ConnectionPooling for DatabaseFileSystem and DbDataStore.
        Database connections are thread bound if necessary.
        Checked again checkstyle.
        Not thoroughly tested.

        Show
        Matej Knopp added a comment - - edited Added ConnectionPooling for DatabaseFileSystem and DbDataStore. Database connections are thread bound if necessary. Checked again checkstyle. Not thoroughly tested.
        Hide
        Matej Knopp added a comment -

        Hi,

        Thanks a lot for the comment. You're right, there might be a problem with different connections obtained. This could be handled by attaching active connection to current thread, so that the nested calls would always obtain the active connection. Anyway, I will look into it and post a new patch.

        I agree that this is a substantial change and will require lot of testing. But i think at some point it will be necessary to bite the bullet and implement connection pooling, whether it will be based on my patch or not. The current situation is rather problematic, keeping opened connection per workspace doesn't scale well at all.

        Show
        Matej Knopp added a comment - Hi, Thanks a lot for the comment. You're right, there might be a problem with different connections obtained. This could be handled by attaching active connection to current thread, so that the nested calls would always obtain the active connection. Anyway, I will look into it and post a new patch. I agree that this is a substantial change and will require lot of testing. But i think at some point it will be necessary to bite the bullet and implement connection pooling, whether it will be based on my patch or not. The current situation is rather problematic, keeping opened connection per workspace doesn't scale well at all.
        Hide
        Stefan Guggisberg added a comment -

        thanks for the patch, matej. that's very much appreciated.

        i quickly browsed through the diff and noticed the following issue:

        it seems like a connection is retrieved from the pool in every
        PersistenceManager method. that's probably fine for
        reading methods but that's not gonna work for writing methods
        since they need to use the same connection (i.e. transaction).
        all method calls within the store(ChangeLog) scope need
        to use the same connection (with autoCommit set to false),
        otherwise you'll end up with inconsistent/brokem repositories.

        i am also a bit concerned about the impact of the proposed change
        since it touches a lot of current code. the patch would have to be
        thoroughly tested with all currently supported backends...

        Show
        Stefan Guggisberg added a comment - thanks for the patch, matej. that's very much appreciated. i quickly browsed through the diff and noticed the following issue: it seems like a connection is retrieved from the pool in every PersistenceManager method. that's probably fine for reading methods but that's not gonna work for writing methods since they need to use the same connection (i.e. transaction). all method calls within the store(ChangeLog) scope need to use the same connection (with autoCommit set to false), otherwise you'll end up with inconsistent/brokem repositories. i am also a bit concerned about the impact of the proposed change since it touches a lot of current code. the patch would have to be thoroughly tested with all currently supported backends...
        Matej Knopp made changes -
        Field Original Value New Value
        Attachment patch-1456-1.txt [ 12386588 ]
        Hide
        Matej Knopp added a comment -

        Proof of concept patch.

        • Abstracts database connection creation to allow pluggable pooling implementation.
        • Not thoroughly tested and also not checked against checkstyle.
        • So far it only covers BundleDbPersistenceManager and it's subclasses. All other components (db journal, db filesystem still use ConnectionRecoveryManager).
        Show
        Matej Knopp added a comment - Proof of concept patch. Abstracts database connection creation to allow pluggable pooling implementation. Not thoroughly tested and also not checked against checkstyle. So far it only covers BundleDbPersistenceManager and it's subclasses. All other components (db journal, db filesystem still use ConnectionRecoveryManager).
        Hide
        Matej Knopp added a comment -

        Any update on this? Is there any estimation or is there a patch expected?

        Show
        Matej Knopp added a comment - Any update on this? Is there any estimation or is there a patch expected?
        Hide
        Jukka Zitting added a comment -

        > Is it possible to have, in those areas, the same problem reported in JCR-1388?

        Connection pools would nicely solve most of our concurrent access issues, as we wouldn't be constrained to a single connection by default and wouldn't need workarounds like the one in JCR-1388.

        > The idea is to use a pool package or to build our own?

        I'd leverage a pooling DataSource whenever available (JNDI configuration), and use commons-dbcp to pool explicitly configured connections (JDBC Driver configuration).

        Show
        Jukka Zitting added a comment - > Is it possible to have, in those areas, the same problem reported in JCR-1388 ? Connection pools would nicely solve most of our concurrent access issues, as we wouldn't be constrained to a single connection by default and wouldn't need workarounds like the one in JCR-1388 . > The idea is to use a pool package or to build our own? I'd leverage a pooling DataSource whenever available (JNDI configuration), and use commons-dbcp to pool explicitly configured connections (JDBC Driver configuration).
        Hide
        Esteban Franqueiro added a comment -

        Is it possible to have, in those areas, the same problem reported in JCR-1388?
        The idea is to use a pool package or to build our own?

        Show
        Esteban Franqueiro added a comment - Is it possible to have, in those areas, the same problem reported in JCR-1388 ? The idea is to use a pool package or to build our own?
        Jukka Zitting created issue -

          People

          • Assignee:
            Unassigned
            Reporter:
            Jukka Zitting
          • Votes:
            21 Vote for this issue
            Watchers:
            11 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development