Details
-
Bug
-
Status: Closed
-
Critical
-
Resolution: Incomplete
-
2.1.7
-
None
-
None
-
Operating System: All
Platform: All
-
35595
Description
Overview Description:
---------------------
Concurrent requests to pages containing nested queries cause
the SQLTransformer to acquire all available connections
from the database pool, making them wait forever for further
connections to become available.
The connections are never returned to the pool, resulting in
a hang of the webapp (deadlock) until Cocoon is restarted.
Steps to Reproduce:
-------------------
1) Build a default 2.1.7 Cocoon distribution.
2) Add your favorite database driver to web.xml and define
a 'small' database connection pool in cocoon.xconf, for
example: <pool-controller min="2" max="5"/>
3) Create a 'hang.xml' file with nested queries to feed through
the SQLTransformer. A dummy query which returns one row
(like 'SELECT 1') will do just fine! Example:
<?xml version="1.0"?>
<sql xmlns:sql="http://apache.org/cocoon/SQL/2.0">
<sql:execute-query>
<sql:query name="query1">SELECT 1</sql:query>
<sql:execute-query>
<sql:query name="query2">SELECT 2</sql:query>
<sql:execute-query>
<sql:query name="query3">SELECT 3</sql:query>
</sql:execute-query>
</sql:execute-query>
</sql:execute-query>
</sql>
4) Define a pipeline in the root sitemap which will run the
above file through the SQLTransformer:
<map:match pattern="hang">
<map:generate src="hang.xml"/>
<map:transform type="sql">
<map:parameter name="use-connection" value="dbpool"/>
</map:transform>
<map:serialize type="text"/>
</map:match>
Don't forget to test it!
Requesting the url 'http://localhost:8888/hang' with a web-
browser should return: '123'
5) Now run a http benchmarking tool (ab for instance) with multiple
concurrent requests against the above page, for example:
ab -n 100 -c 5 http://localhost:8888/hang
This should launch 5 concurrent, and 100 total, requests.
See what happens! (Timeout?)
Try to reload the http://localhost:8888/hang url with your
web-browser, and see what happens! (Timeout?)
Wait a few minutes, a few hours or a few days, try to access
the above url again, and see what happens! (Timeout?)
Actual Results:
---------------
Any subsequent request to the above url, or more generally
to a page which requires a database connection from the pool
('dbpool' in the example) will hang forever!
In a dynamic webapp, where every page request requires accessing
the database, the above situation will result in a hang of your
whole webapp ...until you restart Cocoon.
Expected Results:
-----------------
123 ;-)
It is acceptable for the webapp to respond slowly, or to timeout/
complain about requests which cannot be handled under heavy load!
But the webapp should not hang forever, requiring a restart of
Cocoon!
Build Date & Platform:
----------------------
I reproduced the behaviour with the current Cocoon 2.1.7 release
on both WinXP and Linux RedHat, running Java 1.4.2, and with
PostgreSQL 7.4.6 and Oracle9i as database backends.
Given the diversity of OSes and DBs, it really doesn't smell
like a platform-related problem!
Additional Information:
-----------------------
Included is the post I made to the users@cocoon.apache.org list
prior to writing this bug report.
It describes the problem in more detail, a brutal workaround,
and a possible explanation of what is happening:
----------------------------------------------------------------------------
Hello again,
The deadlock situation you describe is a plausible explanation for the
experienced Cocoon 'hangs'!
I took a deep breath and had a look into SQLTransformer.java, and I stumbled
upon the following code snippet in the executeQuery() method:
try {
if (index == 0) {
if (this.conn == null) // The first top level execute-query
this.conn = query.getConnection();
// reuse the global connection for all top level queries
conn = this.conn;
}
else // index > 0, sub queries are always executed in an own connection
conn = query.getConnection();
query.setConnection(conn);
query.execute();
} ...
Effectively, a new connection is being requested for every nested subquery!
What perplexes me, is that the getConnection() method appears to go to quite
some lengths to ensure that it won't block if it can't acquire a connection
after waiting/retrying a few times.
Normally it should complain (ie. write to the logs) and abort, but that never
happens!
I am going to post this as a bug report.
Thanks for the prompt response,
Fabrizio
>> Hello,
>>
>> I'm developing a dynamic Cocoon webapp that performs a lot of (nested)
>> queries using the SQLTransformer.
>>
>> The SQLTransformer performs SELECT operations only, and it is configured to
>> use a JDBC database connection pool that is managed by Cocoon itself.
>> Excerpt from cocoon.xconf:
>>
>> <datasources>
>> <jdbc logger="core.datasources.db" name="db">
>> <pool-controller min="5" max="50"/>
>> ...
>> </jdbc>
>> </datasources>
>>
>>
>> At some point, I decided to stress-test the webapp using the 'ab'-tool
>> (Apache benchmark) to simulate a large number of concurrent requests.
>>
>> When the number of concurrent requests is heading towards the 'max'
>> number of pooled database connections, the following occurs:
>>
>> - The 'ab' benchmark is aborted with a timeout error.
>>
>> - Using a database monitoring tool, you will notice that the 'max' number of
>> pooled database connections have been left open.
>>
>> - Cocoon 'hangs' indefinitely!
>> Any further requests to the webapp will timeout, and nothing gets written to
>> the logs (access.log, error.log...)
>> There are no warnings, nor any error messages in the logs that would
>> indicate why the hang occured in the first place. Up to the hang, everything
>> appears to run normally!
>>
>> That's scary!
>>
>>
>> Currently, I work around the problem with a big axe! That is, by limiting the
>> number of socket listener threads of the servlet container, and by defining a
>> rather oversized pool of database connections.
>> Excerpt from Jetty's main.xml configuration file:
>>
>> <Call name="addListener">
>> <Arg>
>> <New class="org.mortbay.http.SocketListener">
>> <Set name="MinThreads">8</Set>
>> <Set name="MaxThreads">16</Set>
>> ...
>> </New>
>> </Arg>
>> </Call>
>>
>>
>> My best guess is that the SQLTransformer will wait indefinitely (causing
>> Cocoon to hang) if it fails to get a database connection from the pool.
>>
>> The SQLTransformer further appears to require more than one database
>> connection (...which seems to be related to the depth of query nesting levels,
>> but that's a far-fetched guess!)
>>
>> Has someone out there experienced similar problems, or have a more
>> enlightening explanation to what is going on?
>>
>> As my fix is based on a lot of guesswork, I would welcome a more robust solution.
>>
>>
>> Best regards,
>> Fabrizio
>
> It sounds like SQLTransformer naively gets a new connection per query.
> This in turn requires that if your page has 4 nested queries then
> SQLTransformer is using 4 of your 50 connections. It only takes 13
> simultaneous requests to use up all the connections at that rate, and if
> all your requests have anywhere between 1 and 3 connections reserved and
> are not finished yet, none of the pages will release the connection and
> a deadlock occurs. If you have as many as 10 nested queries it is
> obvious to see how something like this could occur.
>
> I think the ESQL logicsheet suffered from this many moons ago, but
> resolved it by requesting one connection per page. BTW, Microsoft's
> feeble driver is limited to one connection per query--so your choice is
> to use jTDS or some other company's version.
>
> Nonetheless, it seems like there is a bug in SQLTransformer. Submit a
> bug report, and hopefully it can get resolved.
>
> --
>
> Design is a funny word. Some people think design means how it looks.
> But of course, if you dig deeper, it's really how it works.
>
> -- Steve Jobs
---------------------
Concurrent requests to pages containing nested queries cause
the SQLTransformer to acquire all available connections
from the database pool, making them wait forever for further
connections to become available.
The connections are never returned to the pool, resulting in
a hang of the webapp (deadlock) until Cocoon is restarted.
Steps to Reproduce:
-------------------
1) Build a default 2.1.7 Cocoon distribution.
2) Add your favorite database driver to web.xml and define
a 'small' database connection pool in cocoon.xconf, for
example: <pool-controller min="2" max="5"/>
3) Create a 'hang.xml' file with nested queries to feed through
the SQLTransformer. A dummy query which returns one row
(like 'SELECT 1') will do just fine! Example:
<?xml version="1.0"?>
<sql xmlns:sql="http://apache.org/cocoon/SQL/2.0">
<sql:execute-query>
<sql:query name="query1">SELECT 1</sql:query>
<sql:execute-query>
<sql:query name="query2">SELECT 2</sql:query>
<sql:execute-query>
<sql:query name="query3">SELECT 3</sql:query>
</sql:execute-query>
</sql:execute-query>
</sql:execute-query>
</sql>
4) Define a pipeline in the root sitemap which will run the
above file through the SQLTransformer:
<map:match pattern="hang">
<map:generate src="hang.xml"/>
<map:transform type="sql">
<map:parameter name="use-connection" value="dbpool"/>
</map:transform>
<map:serialize type="text"/>
</map:match>
Don't forget to test it!
Requesting the url 'http://localhost:8888/hang' with a web-
browser should return: '123'
5) Now run a http benchmarking tool (ab for instance) with multiple
concurrent requests against the above page, for example:
ab -n 100 -c 5 http://localhost:8888/hang
This should launch 5 concurrent, and 100 total, requests.
See what happens! (Timeout?)
Try to reload the http://localhost:8888/hang url with your
web-browser, and see what happens! (Timeout?)
Wait a few minutes, a few hours or a few days, try to access
the above url again, and see what happens! (Timeout?)
Actual Results:
---------------
Any subsequent request to the above url, or more generally
to a page which requires a database connection from the pool
('dbpool' in the example) will hang forever!
In a dynamic webapp, where every page request requires accessing
the database, the above situation will result in a hang of your
whole webapp ...until you restart Cocoon.
Expected Results:
-----------------
123 ;-)
It is acceptable for the webapp to respond slowly, or to timeout/
complain about requests which cannot be handled under heavy load!
But the webapp should not hang forever, requiring a restart of
Cocoon!
Build Date & Platform:
----------------------
I reproduced the behaviour with the current Cocoon 2.1.7 release
on both WinXP and Linux RedHat, running Java 1.4.2, and with
PostgreSQL 7.4.6 and Oracle9i as database backends.
Given the diversity of OSes and DBs, it really doesn't smell
like a platform-related problem!
Additional Information:
-----------------------
Included is the post I made to the users@cocoon.apache.org list
prior to writing this bug report.
It describes the problem in more detail, a brutal workaround,
and a possible explanation of what is happening:
----------------------------------------------------------------------------
Hello again,
The deadlock situation you describe is a plausible explanation for the
experienced Cocoon 'hangs'!
I took a deep breath and had a look into SQLTransformer.java, and I stumbled
upon the following code snippet in the executeQuery() method:
try {
if (index == 0) {
if (this.conn == null) // The first top level execute-query
this.conn = query.getConnection();
// reuse the global connection for all top level queries
conn = this.conn;
}
else // index > 0, sub queries are always executed in an own connection
conn = query.getConnection();
query.setConnection(conn);
query.execute();
} ...
Effectively, a new connection is being requested for every nested subquery!
What perplexes me, is that the getConnection() method appears to go to quite
some lengths to ensure that it won't block if it can't acquire a connection
after waiting/retrying a few times.
Normally it should complain (ie. write to the logs) and abort, but that never
happens!
I am going to post this as a bug report.
Thanks for the prompt response,
Fabrizio
>> Hello,
>>
>> I'm developing a dynamic Cocoon webapp that performs a lot of (nested)
>> queries using the SQLTransformer.
>>
>> The SQLTransformer performs SELECT operations only, and it is configured to
>> use a JDBC database connection pool that is managed by Cocoon itself.
>> Excerpt from cocoon.xconf:
>>
>> <datasources>
>> <jdbc logger="core.datasources.db" name="db">
>> <pool-controller min="5" max="50"/>
>> ...
>> </jdbc>
>> </datasources>
>>
>>
>> At some point, I decided to stress-test the webapp using the 'ab'-tool
>> (Apache benchmark) to simulate a large number of concurrent requests.
>>
>> When the number of concurrent requests is heading towards the 'max'
>> number of pooled database connections, the following occurs:
>>
>> - The 'ab' benchmark is aborted with a timeout error.
>>
>> - Using a database monitoring tool, you will notice that the 'max' number of
>> pooled database connections have been left open.
>>
>> - Cocoon 'hangs' indefinitely!
>> Any further requests to the webapp will timeout, and nothing gets written to
>> the logs (access.log, error.log...)
>> There are no warnings, nor any error messages in the logs that would
>> indicate why the hang occured in the first place. Up to the hang, everything
>> appears to run normally!
>>
>> That's scary!
>>
>>
>> Currently, I work around the problem with a big axe! That is, by limiting the
>> number of socket listener threads of the servlet container, and by defining a
>> rather oversized pool of database connections.
>> Excerpt from Jetty's main.xml configuration file:
>>
>> <Call name="addListener">
>> <Arg>
>> <New class="org.mortbay.http.SocketListener">
>> <Set name="MinThreads">8</Set>
>> <Set name="MaxThreads">16</Set>
>> ...
>> </New>
>> </Arg>
>> </Call>
>>
>>
>> My best guess is that the SQLTransformer will wait indefinitely (causing
>> Cocoon to hang) if it fails to get a database connection from the pool.
>>
>> The SQLTransformer further appears to require more than one database
>> connection (...which seems to be related to the depth of query nesting levels,
>> but that's a far-fetched guess!)
>>
>> Has someone out there experienced similar problems, or have a more
>> enlightening explanation to what is going on?
>>
>> As my fix is based on a lot of guesswork, I would welcome a more robust solution.
>>
>>
>> Best regards,
>> Fabrizio
>
> It sounds like SQLTransformer naively gets a new connection per query.
> This in turn requires that if your page has 4 nested queries then
> SQLTransformer is using 4 of your 50 connections. It only takes 13
> simultaneous requests to use up all the connections at that rate, and if
> all your requests have anywhere between 1 and 3 connections reserved and
> are not finished yet, none of the pages will release the connection and
> a deadlock occurs. If you have as many as 10 nested queries it is
> obvious to see how something like this could occur.
>
> I think the ESQL logicsheet suffered from this many moons ago, but
> resolved it by requesting one connection per page. BTW, Microsoft's
> feeble driver is limited to one connection per query--so your choice is
> to use jTDS or some other company's version.
>
> Nonetheless, it seems like there is a bug in SQLTransformer. Submit a
> bug report, and hopefully it can get resolved.
>
> --
>
> Design is a funny word. Some people think design means how it looks.
> But of course, if you dig deeper, it's really how it works.
>
> -- Steve Jobs