Esteban Franqueiro added a comment - 05/Mar/08 06:38 PM Is it possible to have, in those areas, the same problem reported in JCR-1388?
The idea is to use a pool package or to build our own?
> Is it possible to have, in those areas, the same problem reported in JCR-1388?
Connection pools would nicely solve most of our concurrent access issues, as we wouldn't be constrained to a single connection by default and wouldn't need workarounds like the one in JCR-1388.
> The idea is to use a pool package or to build our own?
I'd leverage a pooling DataSource whenever available (JNDI configuration), and use commons-dbcp to pool explicitly configured connections (JDBC Driver configuration).
Jukka Zitting added a comment - 06/Mar/08 08:47 AM > Is it possible to have, in those areas, the same problem reported in JCR-1388?
Connection pools would nicely solve most of our concurrent access issues, as we wouldn't be constrained to a single connection by default and wouldn't need workarounds like the one in JCR-1388.
> The idea is to use a pool package or to build our own?
I'd leverage a pooling DataSource whenever available (JNDI configuration), and use commons-dbcp to pool explicitly configured connections (JDBC Driver configuration).
* Abstracts database connection creation to allow pluggable pooling implementation.
* Not thoroughly tested and also not checked against checkstyle.
* So far it only covers BundleDbPersistenceManager and it's subclasses. All other components (db journal, db filesystem still use ConnectionRecoveryManager).
Matej Knopp added a comment - 21/Jul/08 11:19 PM Proof of concept patch.
* Abstracts database connection creation to allow pluggable pooling implementation.
* Not thoroughly tested and also not checked against checkstyle.
* So far it only covers BundleDbPersistenceManager and it's subclasses. All other components (db journal, db filesystem still use ConnectionRecoveryManager).
thanks for the patch, matej. that's very much appreciated.
i quickly browsed through the diff and noticed the following issue:
it seems like a connection is retrieved from the pool in every
PersistenceManager method. that's probably fine for
reading methods but that's not gonna work for writing methods
since they need to use the same connection (i.e. transaction).
all method calls within the store(ChangeLog) scope need
to use the same connection (with autoCommit set to false),
otherwise you'll end up with inconsistent/brokem repositories.
i am also a bit concerned about the impact of the proposed change
since it touches a lot of current code. the patch would have to be
thoroughly tested with all currently supported backends...
Stefan Guggisberg added a comment - 23/Jul/08 12:43 PM thanks for the patch, matej. that's very much appreciated.
i quickly browsed through the diff and noticed the following issue:
it seems like a connection is retrieved from the pool in every
PersistenceManager method. that's probably fine for
reading methods but that's not gonna work for writing methods
since they need to use the same connection (i.e. transaction).
all method calls within the store(ChangeLog) scope need
to use the same connection (with autoCommit set to false),
otherwise you'll end up with inconsistent/brokem repositories.
i am also a bit concerned about the impact of the proposed change
since it touches a lot of current code. the patch would have to be
thoroughly tested with all currently supported backends...
Thanks a lot for the comment. You're right, there might be a problem with different connections obtained. This could be handled by attaching active connection to current thread, so that the nested calls would always obtain the active connection. Anyway, I will look into it and post a new patch.
I agree that this is a substantial change and will require lot of testing. But i think at some point it will be necessary to bite the bullet and implement connection pooling, whether it will be based on my patch or not. The current situation is rather problematic, keeping opened connection per workspace doesn't scale well at all.
Matej Knopp added a comment - 23/Jul/08 02:36 PM Hi,
Thanks a lot for the comment. You're right, there might be a problem with different connections obtained. This could be handled by attaching active connection to current thread, so that the nested calls would always obtain the active connection. Anyway, I will look into it and post a new patch.
I agree that this is a substantial change and will require lot of testing. But i think at some point it will be necessary to bite the bullet and implement connection pooling, whether it will be based on my patch or not. The current situation is rather problematic, keeping opened connection per workspace doesn't scale well at all.
Matej Knopp added a comment - 10/Aug/08 04:56 PM - edited
Added ConnectionPooling for DatabaseFileSystem and DbDataStore.
Database connections are thread bound if necessary.
Checked again checkstyle.
Not thoroughly tested.
Matej Knopp added a comment - 10/Aug/08 04:56 PM - edited Added ConnectionPooling for DatabaseFileSystem and DbDataStore.
Database connections are thread bound if necessary.
Checked again checkstyle.
Not thoroughly tested.
Thomas Mueller added a comment - 18/Aug/08 05:57 PM What about doing that for DbDataStore first? The patch would be much smaller.
> attaching active connection to current thread, so that the nested calls would always obtain the active connection
That sounds too complicated, too tricky, and too slow for me. For store(ChangeLog), why not simply pass the connection object to the nested calls?
> require lot of testing
Just to make sure: You mean automated tests, right? Manual tests is a maintenance problem.
> What about doing that for DbDataStore first? The patch would be much smaller.
Well, the file system is also instantiated per workspace so I believe connection pooling makes sense there. But it can be excluded from the patch, that shouldn't be a big issue.
Same goes for DataStore. But while you can live without Db file system, DbDateStore is more or less necessary for clustered environments and the lack of connection pooling can be a serious issue there.
> That sounds too complicated, too tricky, and too slow for me. For store(ChangeLog), why not simply pass the connection object to the nested calls?
Because the nested calls are invoked from BundleDbPersistenceManager which doesn't know about database connection.
It's really not complicated at all. There is one class (ThreadLocalConnectionProviderAdapter) that makes sure that getConnection() returns same connection for "nested" calls.
Also, could you please be more specific about what exactly sounds too slow about this?
> Just to make sure: You mean automated tests, right? Manual tests is a maintenance problem.
Well, all unit tests that work with "vanilla" jackrabbit also work after the patch is applied. However the patch hasn't been heavily tested in "real world" environment or with different database backends.
Matej Knopp added a comment - 18/Aug/08 08:33 PM > What about doing that for DbDataStore first? The patch would be much smaller.
Well, the file system is also instantiated per workspace so I believe connection pooling makes sense there. But it can be excluded from the patch, that shouldn't be a big issue.
Same goes for DataStore. But while you can live without Db file system, DbDateStore is more or less necessary for clustered environments and the lack of connection pooling can be a serious issue there.
> That sounds too complicated, too tricky, and too slow for me. For store(ChangeLog), why not simply pass the connection object to the nested calls?
Because the nested calls are invoked from BundleDbPersistenceManager which doesn't know about database connection.
It's really not complicated at all. There is one class (ThreadLocalConnectionProviderAdapter) that makes sure that getConnection() returns same connection for "nested" calls.
Also, could you please be more specific about what exactly sounds too slow about this?
> Just to make sure: You mean automated tests, right? Manual tests is a maintenance problem.
Well, all unit tests that work with "vanilla" jackrabbit also work after the patch is applied. However the patch hasn't been heavily tested in "real world" environment or with different database backends.
> the patch hasn't been heavily tested in "real world"
I have already said, manual tests and real world tests are a maintenance problem. If there is no automated test, each change is big risk. Maintaining and improving the code is very hard in this case.
Thomas Mueller added a comment - 28/Aug/08 12:08 AM Thanks for the patch!
> It's really not complicated at all. There is one class (ThreadLocalConnectionProviderAdapter...
This class does look complicated to me. To avoid ThreadLocal, what about:
BundleDbPersistenceManager {
Connection currentConnection
synchronized store(..) {
try {
currentConnection = ...
super.store(..)
} finally {
currentConnection = null
}
}
> the patch hasn't been heavily tested in "real world"
I have already said, manual tests and real world tests are a maintenance problem. If there is no automated test, each change is big risk. Maintaining and improving the code is very hard in this case.
Dave Brosius added a comment - 02/Sep/08 05:19 PM I like the idea of this patch, but i think the ConnectionProperties is too specific. It should just be
private String url;
private String driver;
private Properties connectionProperties.
for instance, i would like to add
properties.put("oracle.net.ssl_cipher_suites", "(SSL_DH_anon_WITH_3DES_EDE_CBC_SHA, SSL_DH_anon_WITH_RC4_128_MD5, SSL_DH_anon_WITH_DES_CBC_SHA)";
so that i can connect to the jackrabbit database over SSL.
And of course a similar implication for the repository.xml, to include arbitrary connection properties.
Thomas Mueller added a comment - 09/Sep/08 08:55 AM Hi Dave,
The connection properties you described are unrelated to "Database connection pooling", right?
If yes then I suggest to open another issue.
Regards,
Thomas
Sunil D'Monte added a comment - 04/Mar/09 05:46 AM Just wanted to ask what the status of this ticket is. I see it's unassigned and there's no fix version right now; is it still planned for 1.6?
Jukka Zitting added a comment - 05/Mar/09 09:12 AM As far as I know, nobody is actively working on this. We'll include this in 1.6 if someone comes up with a patch that everyone agrees on by that time.
There's been two separate revisions of the original patch and it was never applied. What guarantee does the community or a contributor have that this patch would be applied if it were reworked to be current?
Brian Topping added a comment - 15/Apr/09 06:14 PM There's been two separate revisions of the original patch and it was never applied. What guarantee does the community or a contributor have that this patch would be applied if it were reworked to be current?
What are the requirements for the patch? I'm willing to provide one against current trunk but I have to know upfront what's expected of it so that it doesn't end like the previous attempt.
Matej Knopp added a comment - 14/May/09 06:03 PM What are the requirements for the patch? I'm willing to provide one against current trunk but I have to know upfront what's expected of it so that it doesn't end like the previous attempt.
* The entire store(ChangeLog) operation needs to happen atomically
* Using ThreadLocal for the connection seems unnecessarily complex (from the perspective of someone new trying to understand the code), it's better to pass the connection around as a method argument or encapsulate it as a member variable of an object that performs the database operations
I also have some extra concerns:
* Could you implement this without introducing new configuration entries? We may consider adding that later, but it would be clearer if we first implemented connection pooling with the access configuration that we currently have.
* We should leverage something like Commons DBCP instead of implementing our own connection pooling logic. Commons DBCP is much better than anything that we could come up with.
* Related to the above, we should use the standard DataSource interface interface instead of a custom ConnectionManager class. This would nicely abstract away all the pooling logic and make the code much more familiar to people who already know JDBC. All top-level methods would look like something like this:
public void doSomething() throws SQLException {
Connection connection = dataSource.getConnection();
try {
// do something with the connection
} finally {
connection.close();
}
}
The above are of course just individual opinions. Feel free to argue otherwise if you have a better solution.
PS. The Jackrabbit sandbox is nowadays open to all Apache committers, so if you may want to create a development branch of Jackrabbit trunk (or the 1.x branch) in the sandbox for this work. The changes here are so extensive that it may be easier for us to work incrementally through svn.
PPS. Alternatively, if you know Git, you may want to clone git://git.apache.org/jackrabbit.git and publish your changes for example on Github.
Jukka Zitting added a comment - 14/May/09 06:34 PM I think the main concerns raised above are:
* The entire store(ChangeLog) operation needs to happen atomically
* Using ThreadLocal for the connection seems unnecessarily complex (from the perspective of someone new trying to understand the code), it's better to pass the connection around as a method argument or encapsulate it as a member variable of an object that performs the database operations
I also have some extra concerns:
* Could you implement this without introducing new configuration entries? We may consider adding that later, but it would be clearer if we first implemented connection pooling with the access configuration that we currently have.
* We should leverage something like Commons DBCP instead of implementing our own connection pooling logic. Commons DBCP is much better than anything that we could come up with.
* Related to the above, we should use the standard DataSource interface interface instead of a custom ConnectionManager class. This would nicely abstract away all the pooling logic and make the code much more familiar to people who already know JDBC. All top-level methods would look like something like this:
public void doSomething() throws SQLException {
Connection connection = dataSource.getConnection();
try {
// do something with the connection
} finally {
connection.close();
}
}
The above are of course just individual opinions. Feel free to argue otherwise if you have a better solution.
PS. The Jackrabbit sandbox is nowadays open to all Apache committers, so if you may want to create a development branch of Jackrabbit trunk (or the 1.x branch) in the sandbox for this work. The changes here are so extensive that it may be easier for us to work incrementally through svn.
PPS. Alternatively, if you know Git, you may want to clone git://git.apache.org/jackrabbit.git and publish your changes for example on Github.
>Using ThreadLocal for the connection seems unnecessarily complex (from the perspective of someone new trying to understand the code), it's better to pass the connection around as a method argument or encapsulate it as a member variable of an object that performs the database operations
The problem here is that BundleDbPersistenceManager superclass is connection agnostic so passing the database connection as argument doesn't seem to be an option. Storing that as member variable is, but it requires additional locking. To reduce unnecessary locking I decided to use thread locals. Since it's perceived to be too complicated there won't be any in next patch.
> Could you implement this without introducing new configuration entries? We may consider adding that later, but it would be clearer if we first implemented connection pooling with the access configuration that we currently have.
As far as I can remember all introduced configuration entities were completely optional with sane default making the change pretty much transparent for user.
> We should leverage something like Commons DBCP instead of implementing our own connection pooling logic. Commons DBCP is much better than anything that we could come up with.
What's the point of hardwiring concrete connection pool? In my previous patch I didn't attempt to do any connection pooling. I just had a connection source which could be easily implemented to get the connection from *any* connection pool or JNDI. If you insist though on using commons dbcp I can live with that though.
> Related to the above, we should use the standard DataSource interface interface instead of a custom ConnectionManager class. This would nicely abstract away all the pooling logic and make the code much more familiar to people who already know JDBC. All top-level methods would look like something like this
The connection manager in patch didn't create database connections. It delegated that to a connection provider. The purpose of ConnectionManager were some convenience methods and prepared statements caching. The connection provider was simple DataSource like interface. The reason why I chose not to use DataSource is because DataSource has no means to pass connection properties (url, driver, etc). That normally isn't a problem, in jackrabbit however it is, because the database properties are specified on components level (persistence manager, file system, journal, etc).
If PersistenceManager is supposed to have connection properties (which you seem to insist on in order not to change configuration) then it needs a way to pass these information to code that manages database connections. And DataSource doesn't provide any means for it.
Matej Knopp added a comment - 14/May/09 07:29 PM >Using ThreadLocal for the connection seems unnecessarily complex (from the perspective of someone new trying to understand the code), it's better to pass the connection around as a method argument or encapsulate it as a member variable of an object that performs the database operations
The problem here is that BundleDbPersistenceManager superclass is connection agnostic so passing the database connection as argument doesn't seem to be an option. Storing that as member variable is, but it requires additional locking. To reduce unnecessary locking I decided to use thread locals. Since it's perceived to be too complicated there won't be any in next patch.
> Could you implement this without introducing new configuration entries? We may consider adding that later, but it would be clearer if we first implemented connection pooling with the access configuration that we currently have.
As far as I can remember all introduced configuration entities were completely optional with sane default making the change pretty much transparent for user.
> We should leverage something like Commons DBCP instead of implementing our own connection pooling logic. Commons DBCP is much better than anything that we could come up with.
What's the point of hardwiring concrete connection pool? In my previous patch I didn't attempt to do any connection pooling. I just had a connection source which could be easily implemented to get the connection from *any* connection pool or JNDI. If you insist though on using commons dbcp I can live with that though.
> Related to the above, we should use the standard DataSource interface interface instead of a custom ConnectionManager class. This would nicely abstract away all the pooling logic and make the code much more familiar to people who already know JDBC. All top-level methods would look like something like this
The connection manager in patch didn't create database connections. It delegated that to a connection provider. The purpose of ConnectionManager were some convenience methods and prepared statements caching. The connection provider was simple DataSource like interface. The reason why I chose not to use DataSource is because DataSource has no means to pass connection properties (url, driver, etc). That normally isn't a problem, in jackrabbit however it is, because the database properties are specified on components level (persistence manager, file system, journal, etc).
If PersistenceManager is supposed to have connection properties (which you seem to insist on in order not to change configuration) then it needs a way to pass these information to code that manages database connections. And DataSource doesn't provide any means for it.
The custom sandbox could work for me.
Thanks for your input, it's very appreciated.
> The problem here is that BundleDbPersistenceManager superclass is connection
> agnostic so passing the database connection as argument doesn't seem to be an option.
We can always change the superclass.
> Storing that as member variable is, but it requires additional locking.
Not if it's a member of an extra object that's instantiated per each top level method call.
> As far as I can remember all introduced configuration entities were completely optional
> with sane default making the change pretty much transparent for user.
For now I'm mostly worried about the patch being more complex than it needs to be. We can add config entries once the basic functionality is in place.
> What's the point of hardwiring concrete connection pool?
So we don't need to implement one. Note that the solution should work with existing Jackrabbit configurations that do not specify a connection pool. There's no need for DBCP when a JNDI DataSource is configured, but it makes things a lot easier for non-JNDI configurations.
> In my previous patch I didn't attempt to do any connection pooling.
What's SimplePoolingConnectionProvider for then?
> The reason why I chose not to use DataSource is because DataSource has no means to
> pass connection properties (url, driver, etc).
That's what we'd use Commons DBCP for (in cases when a JNDI DataSource has not already been configured).
Jukka Zitting added a comment - 14/May/09 08:36 PM > The problem here is that BundleDbPersistenceManager superclass is connection
> agnostic so passing the database connection as argument doesn't seem to be an option.
We can always change the superclass.
> Storing that as member variable is, but it requires additional locking.
Not if it's a member of an extra object that's instantiated per each top level method call.
> As far as I can remember all introduced configuration entities were completely optional
> with sane default making the change pretty much transparent for user.
For now I'm mostly worried about the patch being more complex than it needs to be. We can add config entries once the basic functionality is in place.
> What's the point of hardwiring concrete connection pool?
So we don't need to implement one. Note that the solution should work with existing Jackrabbit configurations that do not specify a connection pool. There's no need for DBCP when a JNDI DataSource is configured, but it makes things a lot easier for non-JNDI configurations.
> In my previous patch I didn't attempt to do any connection pooling.
What's SimplePoolingConnectionProvider for then?
> The reason why I chose not to use DataSource is because DataSource has no means to
> pass connection properties (url, driver, etc).
That's what we'd use Commons DBCP for (in cases when a JNDI DataSource has not already been configured).
> The custom sandbox could work for me.
OK, good. I created such a development branch in https://svn.apache.org/repos/asf/jackrabbit/sandbox/JCR-1456.
Well, the superclass methods could pass a context object around that the subclass could use to store connection.
> Not if it's a member of an extra object that's instantiated per each top level method call.
Yes, but then the superclass needs to pass this object as method argument. Which is probably okay.
> For now I'm mostly worried about the patch being more complex than it needs to be. We can add config entries once the basic functionality is in place.
The problem here is that right now every component (persistence manager, fs, journal, ...) in jackrabbit is responsible for creating the database connection thus the connection properties are specified as configuration option for the component. But connection pooling essentially takes this responsibility from component. So how should the database properties be handled? Where should the connection pool be configured?
My previous patch solved this in way that allowed propagation of connection properties from components to connection pool and didn't require any changes to configuration files but that came at cost of readability and complexity. Current way of configuring database connections in jackrabbit assumes the connection is created by the component but this is really not compatible with connection pools.
The cleanest solution would be to add section in configuration to define data sources and then way to assign a data source to each component that needs database connection. This obviously requires changing the configuration scheme though.
If you have an idea how to configure connection pools and configure the components to use those pools while preserving current configuration scheme I'd love to know about it. The approach I tried worked but it resulted in code apparently too complicated to be committed.
There is one way I can imagine connection pooling working without change to configuration syntax. While parsing the configuration jackrabbit would create datasource (with default connection pool i.e. dbcp) for each distinct connection properties and then pass this datasource to the component to which the connection properties belong.
This would make the configuration parsing code bit more complicated but wouldn't require any change to configuration files.
> So we don't need to implement one. Note that the solution should work with existing Jackrabbit configurations that do not specify a connection pool. There's no need for DBCP when a JNDI DataSource is configured, but it makes things a lot easier for non-JNDI configurations.
I agree that jackrabbit should have connection pooling configured by default but that should be overridable in configuration file.
> What's SimplePoolingConnectionProvider for then?
That's just a simple connection pool i used for development. It was meant to be kind of "Default" connection provider and the connection pooling logic should be replaced with DBCP should people have agried with the approach i taked with the patch.
Matej Knopp added a comment - 14/May/09 09:23 PM > We can always change the superclass.
Well, the superclass methods could pass a context object around that the subclass could use to store connection.
> Not if it's a member of an extra object that's instantiated per each top level method call.
Yes, but then the superclass needs to pass this object as method argument. Which is probably okay.
> For now I'm mostly worried about the patch being more complex than it needs to be. We can add config entries once the basic functionality is in place.
The problem here is that right now every component (persistence manager, fs, journal, ...) in jackrabbit is responsible for creating the database connection thus the connection properties are specified as configuration option for the component. But connection pooling essentially takes this responsibility from component. So how should the database properties be handled? Where should the connection pool be configured?
My previous patch solved this in way that allowed propagation of connection properties from components to connection pool and didn't require any changes to configuration files but that came at cost of readability and complexity. Current way of configuring database connections in jackrabbit assumes the connection is created by the component but this is really not compatible with connection pools.
The cleanest solution would be to add section in configuration to define data sources and then way to assign a data source to each component that needs database connection. This obviously requires changing the configuration scheme though.
If you have an idea how to configure connection pools and configure the components to use those pools while preserving current configuration scheme I'd love to know about it. The approach I tried worked but it resulted in code apparently too complicated to be committed.
There is one way I can imagine connection pooling working without change to configuration syntax. While parsing the configuration jackrabbit would create datasource (with default connection pool i.e. dbcp) for each distinct connection properties and then pass this datasource to the component to which the connection properties belong.
This would make the configuration parsing code bit more complicated but wouldn't require any change to configuration files.
> So we don't need to implement one. Note that the solution should work with existing Jackrabbit configurations that do not specify a connection pool. There's no need for DBCP when a JNDI DataSource is configured, but it makes things a lot easier for non-JNDI configurations.
I agree that jackrabbit should have connection pooling configured by default but that should be overridable in configuration file.
> What's SimplePoolingConnectionProvider for then?
That's just a simple connection pool i used for development. It was meant to be kind of "Default" connection provider and the connection pooling logic should be replaced with DBCP should people have agried with the approach i taked with the patch.
> The problem here is that right now every component (persistence manager, fs,
> journal, ...) in jackrabbit is responsible for creating the database connection
> thus the connection properties are specified as configuration option for the
> component. But connection pooling essentially takes this responsibility from
> component. So how should the database properties be handled? Where
> should the connection pool be configured?
Right where we currently configure the single database connection per component.
> If you have an idea how to configure connection pools and configure the
> components to use those pools while preserving current configuration
> scheme I'd love to know about it.
See the attached patch (dbcp.patch) for a simple change that makes all the connections we currently create to come from connection pools. This change obviously doesn't solve the main issue, but should illustrate how I envision us to handle existing database configurations.
> There is one way I can imagine connection pooling working without change
> to configuration syntax. While parsing the configuration jackrabbit would create
> datasource (with default connection pool i.e. dbcp) for each distinct connection
> properties and then pass this datasource to the component to which the
> connection properties belong.
Jukka Zitting added a comment - 15/May/09 11:15 AM > The problem here is that right now every component (persistence manager, fs,
> journal, ...) in jackrabbit is responsible for creating the database connection
> thus the connection properties are specified as configuration option for the
> component. But connection pooling essentially takes this responsibility from
> component. So how should the database properties be handled? Where
> should the connection pool be configured?
Right where we currently configure the single database connection per component.
> If you have an idea how to configure connection pools and configure the
> components to use those pools while preserving current configuration
> scheme I'd love to know about it.
See the attached patch (dbcp.patch) for a simple change that makes all the connections we currently create to come from connection pools. This change obviously doesn't solve the main issue, but should illustrate how I envision us to handle existing database configurations.
> There is one way I can imagine connection pooling working without change
> to configuration syntax. While parsing the configuration jackrabbit would create
> datasource (with default connection pool i.e. dbcp) for each distinct connection
> properties and then pass this datasource to the component to which the
> connection properties belong.
Exactly!
Jukka Zitting added a comment - 22/May/09 12:10 PM What do you think about my dbcp.patch? Should I commit it to the feature branch as a starting point?
OK, thanks. I applied the patch in revision 778741. As a next step I think we should change the ConnectionFactory to return the configured DataSource instead of a Connection object. This way we can push the DataSource reference all the way up to the PersistenceManager implementations and use it to get Connections only on demand.
Jukka Zitting added a comment - 26/May/09 03:18 PM OK, thanks. I applied the patch in revision 778741. As a next step I think we should change the ConnectionFactory to return the configured DataSource instead of a Connection object. This way we can push the DataSource reference all the way up to the PersistenceManager implementations and use it to get Connections only on demand.
There is one problem with your patch that I overlooked. You create new BasicDataSource every time getDriverDataSource() is called. I think there should only be one datasource instance per driverclass/url combo. Otherwise it just keeps creating pools.
I can fix this easily but it will take some time. Right now I'm in process of getting BundleDbPersistenceManager and subclasses used to borrowing connections instead of relying on one shared always being available.
Matej Knopp added a comment - 26/May/09 04:44 PM There is one problem with your patch that I overlooked. You create new BasicDataSource every time getDriverDataSource() is called. I think there should only be one datasource instance per driverclass/url combo. Otherwise it just keeps creating pools.
I can fix this easily but it will take some time. Right now I'm in process of getting BundleDbPersistenceManager and subclasses used to borrowing connections instead of relying on one shared always being available.
> You create new BasicDataSource every time getDriverDataSource() is called.
That's by design. As noted in my previous comment, I think we should replace ConnectionFactory.getConnection() with ConnectionFactory.getDataSource() and store a reference to the returned DataSource in the persistence manager.
> I think there should only be one datasource instance per driverclass/url combo.
Eventually yes. There the extra configuration parts that you proposed earlier will come in handy. However I think it's more straightforward if we start with one DataSource per persistence manager for now. Doing it this way we can keep the changes nicely localized within a single persistence manager. We can change the configuration mechanism or introduce some repository-local DataSource registry later on once the basic pooling functionality is there.
Jukka Zitting added a comment - 26/May/09 06:10 PM > You create new BasicDataSource every time getDriverDataSource() is called.
That's by design. As noted in my previous comment, I think we should replace ConnectionFactory.getConnection() with ConnectionFactory.getDataSource() and store a reference to the returned DataSource in the persistence manager.
> I think there should only be one datasource instance per driverclass/url combo.
Eventually yes. There the extra configuration parts that you proposed earlier will come in handy. However I think it's more straightforward if we start with one DataSource per persistence manager for now. Doing it this way we can keep the changes nicely localized within a single persistence manager. We can change the configuration mechanism or introduce some repository-local DataSource registry later on once the basic pooling functionality is there.
Matej Knopp added a comment - 28/May/09 07:42 PM Initial version of new patch. Probably lacks lot of polish. Would be nice if someone reviewed the patch and provided feedback.
Reviewing a 2000+ line patch isn't too easy. Could you split it to smaller pieces? Also, feel free to commit the incremental changes directly to the branch in the sandbox. That way we can better label, comment and discuss each step separately instead of syncing up only on aggregate patches.
On the proposed changes: Good stuff, thanks! We're definitely seeing good progress here.
The main concern I have is about the Context concept you're introducing. I see where you're coming, but I think there's a better way to do this. The "context" of a method call is the object on which the method is called. How about, instead of passing the Context objects around, we actually moved the recipient methods *into* the Context class?
The Context class would then become something like a generic DatabaseOperation base class that encapsulates the database Connection being used for that operation. Subclasses like LoadBundleOperation, SaveChangesOperation or CheckSchemaOperation could extend this base class with specific functionality that we currently have inside the PersistenceManager (and other) classes. Database-specific extensions can be handled as yet another subclasses like OracleCheckSchemaOperation and the PersistenceManager classes would simply act as factories of these Operation instances instead of actually implementing the database functionality.
Jukka Zitting added a comment - 29/May/09 12:22 PM Reviewing a 2000+ line patch isn't too easy. Could you split it to smaller pieces? Also, feel free to commit the incremental changes directly to the branch in the sandbox. That way we can better label, comment and discuss each step separately instead of syncing up only on aggregate patches.
On the proposed changes: Good stuff, thanks! We're definitely seeing good progress here.
The main concern I have is about the Context concept you're introducing. I see where you're coming, but I think there's a better way to do this. The "context" of a method call is the object on which the method is called. How about, instead of passing the Context objects around, we actually moved the recipient methods *into* the Context class?
The Context class would then become something like a generic DatabaseOperation base class that encapsulates the database Connection being used for that operation. Subclasses like LoadBundleOperation, SaveChangesOperation or CheckSchemaOperation could extend this base class with specific functionality that we currently have inside the PersistenceManager (and other) classes. Database-specific extensions can be handled as yet another subclasses like OracleCheckSchemaOperation and the PersistenceManager classes would simply act as factories of these Operation instances instead of actually implementing the database functionality.
WDYT?
Unfortunately most of the patch is one big change - it modifies AbstractBundlePersistenceManager so it requires all it's subclasses to be adapted.
The context class was IMHO probably the easiest way to introduce connection pooling without requiring complete refactor/rewrite of persistence managers. Yet the patch is quite big.
I like the idea of operations. This would however be far bigger change that what I did. I thought the idea was to introduce connection pooling with minimal fuzz. Looks like I was wrong. I will look into this. I agree that if done properly this would be much cleaner solution than passing context object around.
Matej Knopp added a comment - 29/May/09 02:47 PM Thanks for the feedback.
Unfortunately most of the patch is one big change - it modifies AbstractBundlePersistenceManager so it requires all it's subclasses to be adapted.
The context class was IMHO probably the easiest way to introduce connection pooling without requiring complete refactor/rewrite of persistence managers. Yet the patch is quite big.
I like the idea of operations. This would however be far bigger change that what I did. I thought the idea was to introduce connection pooling with minimal fuzz. Looks like I was wrong. I will look into this. I agree that if done properly this would be much cleaner solution than passing context object around.
My concerns about too many changes were mostly about having the modifications go too much beyond the o.a.j.core.persistence package (and other database-related packages). What we do inside those packages is open for discussion, and I'd personally prefer to reach a clean design that's built with connection pooling in mind than to patch the current design to work with pooled connections.
Anyway, I think the Context class is a good starting point, and we can continue by refactoring until the design is better.
Fair point about the one big change. If you like you can commit the full patch as-is and we can work from there in svn.
Jukka Zitting added a comment - 30/May/09 07:56 AM My concerns about too many changes were mostly about having the modifications go too much beyond the o.a.j.core.persistence package (and other database-related packages). What we do inside those packages is open for discussion, and I'd personally prefer to reach a clean design that's built with connection pooling in mind than to patch the current design to work with pooled connections.
Anyway, I think the Context class is a good starting point, and we can continue by refactoring until the design is better.
Fair point about the one big change. If you like you can commit the full patch as-is and we can work from there in svn.
To reduce the risk of problems, what about creating new classes instead of patching the existing classes? Like that you could concentrate on one database type first. It's just an idea...
Thomas Mueller added a comment - 30/May/09 08:31 AM To reduce the risk of problems, what about creating new classes instead of patching the existing classes? Like that you could concentrate on one database type first. It's just an idea...
We're already working on a branch, so I'm not that worried about changing things. Let's see what we come up with and then consider whether the result should be merged back as-is or perhaps copied into a new package in trunk. We should probably also set up some extra integration tests that exercise all the database types that we can easily set up.
Jukka Zitting added a comment - 30/May/09 09:12 AM We're already working on a branch, so I'm not that worried about changing things. Let's see what we come up with and then consider whether the result should be merged back as-is or perhaps copied into a new package in trunk. We should probably also set up some extra integration tests that exercise all the database types that we can easily set up.
JCR-1388?The idea is to use a pool package or to build our own?