Uploaded image for project: 'Cocoon'
  1. Cocoon
  2. COCOON-1550

SQLTransformer makes Cocoon hang under high load

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Critical
    • Resolution: Incomplete
    • 2.1.7
    • None
    • Blocks: Databases
    • None
    • Operating System: All
      Platform: All
    • 35595

    Description

      Overview Description:
      ---------------------

      Concurrent requests to pages containing nested queries cause
      the SQLTransformer to acquire all available connections
      from the database pool, making them wait forever for further
      connections to become available.
      The connections are never returned to the pool, resulting in
      a hang of the webapp (deadlock) until Cocoon is restarted.


      Steps to Reproduce:
      -------------------

      1) Build a default 2.1.7 Cocoon distribution.

      2) Add your favorite database driver to web.xml and define
         a 'small' database connection pool in cocoon.xconf, for
         example: <pool-controller min="2" max="5"/>
         
      3) Create a 'hang.xml' file with nested queries to feed through
         the SQLTransformer. A dummy query which returns one row
         (like 'SELECT 1') will do just fine! Example:
           
          <?xml version="1.0"?>
          <sql xmlns:sql="http://apache.org/cocoon/SQL/2.0">
              <sql:execute-query>
                  <sql:query name="query1">SELECT 1</sql:query>
                  <sql:execute-query>
                      <sql:query name="query2">SELECT 2</sql:query>
                      <sql:execute-query>
                         <sql:query name="query3">SELECT 3</sql:query>
                      </sql:execute-query>
                  </sql:execute-query>
              </sql:execute-query>
          </sql>

      4) Define a pipeline in the root sitemap which will run the
         above file through the SQLTransformer:
         
           <map:match pattern="hang">
              <map:generate src="hang.xml"/>
              <map:transform type="sql">
                <map:parameter name="use-connection" value="dbpool"/>
              </map:transform>
              <map:serialize type="text"/>
           </map:match>
        
         Don't forget to test it!

         Requesting the url 'http://localhost:8888/hang&#39; with a web-
         browser should return: '123'


      5) Now run a http benchmarking tool (ab for instance) with multiple
         concurrent requests against the above page, for example:
         
           ab -n 100 -c 5 http://localhost:8888/hang
         
         This should launch 5 concurrent, and 100 total, requests.
         See what happens! (Timeout?)
         Try to reload the http://localhost:8888/hang url with your
         web-browser, and see what happens! (Timeout?)
         Wait a few minutes, a few hours or a few days, try to access
         the above url again, and see what happens! (Timeout?)


      Actual Results:
      ---------------

         Any subsequent request to the above url, or more generally
         to a page which requires a database connection from the pool
         ('dbpool' in the example) will hang forever!

         In a dynamic webapp, where every page request requires accessing
         the database, the above situation will result in a hang of your
         whole webapp ...until you restart Cocoon.
          

      Expected Results:
      -----------------

         123 ;-)

         It is acceptable for the webapp to respond slowly, or to timeout/
         complain about requests which cannot be handled under heavy load!

         But the webapp should not hang forever, requiring a restart of
         Cocoon!
         

      Build Date & Platform:
      ----------------------

         I reproduced the behaviour with the current Cocoon 2.1.7 release
         on both WinXP and Linux RedHat, running Java 1.4.2, and with
         PostgreSQL 7.4.6 and Oracle9i as database backends.
         Given the diversity of OSes and DBs, it really doesn't smell
         like a platform-related problem!
         
        
      Additional Information:
      -----------------------

         Included is the post I made to the users@cocoon.apache.org list
         prior to writing this bug report.
         It describes the problem in more detail, a brutal workaround,
         and a possible explanation of what is happening:


      ----------------------------------------------------------------------------
         
      Hello again,

      The deadlock situation you describe is a plausible explanation for the
      experienced Cocoon 'hangs'!

      I took a deep breath and had a look into SQLTransformer.java, and I stumbled
      upon the following code snippet in the executeQuery() method:

          try {
              if (index == 0) {
                  if (this.conn == null) // The first top level execute-query
                      this.conn = query.getConnection();
                  // reuse the global connection for all top level queries
                  conn = this.conn;
              }
              else // index > 0, sub queries are always executed in an own connection
                  conn = query.getConnection();

              query.setConnection(conn);
              query.execute();
          } ...


      Effectively, a new connection is being requested for every nested subquery!

      What perplexes me, is that the getConnection() method appears to go to quite
      some lengths to ensure that it won't block if it can't acquire a connection
      after waiting/retrying a few times.
      Normally it should complain (ie. write to the logs) and abort, but that never
      happens!

      I am going to post this as a bug report.


      Thanks for the prompt response,
      Fabrizio

      >> Hello,
      >>
      >> I'm developing a dynamic Cocoon webapp that performs a lot of (nested)
      >> queries using the SQLTransformer.
      >>
      >> The SQLTransformer performs SELECT operations only, and it is configured to
      >> use a JDBC database connection pool that is managed by Cocoon itself.
      >> Excerpt from cocoon.xconf:
      >>
      >> <datasources>
      >> <jdbc logger="core.datasources.db" name="db">
      >> <pool-controller min="5" max="50"/>
      >> ...
      >> </jdbc>
      >> </datasources>
      >>
      >>
      >> At some point, I decided to stress-test the webapp using the 'ab'-tool
      >> (Apache benchmark) to simulate a large number of concurrent requests.
      >>
      >> When the number of concurrent requests is heading towards the 'max'
      >> number of pooled database connections, the following occurs:
      >>
      >> - The 'ab' benchmark is aborted with a timeout error.
      >>
      >> - Using a database monitoring tool, you will notice that the 'max' number of
      >> pooled database connections have been left open.
      >>
      >> - Cocoon 'hangs' indefinitely!
      >> Any further requests to the webapp will timeout, and nothing gets written to
      >> the logs (access.log, error.log...)
      >> There are no warnings, nor any error messages in the logs that would
      >> indicate why the hang occured in the first place. Up to the hang, everything
      >> appears to run normally!
      >>
      >> That's scary!
      >>
      >>
      >> Currently, I work around the problem with a big axe! That is, by limiting the
      >> number of socket listener threads of the servlet container, and by defining a
      >> rather oversized pool of database connections.
      >> Excerpt from Jetty's main.xml configuration file:
      >>
      >> <Call name="addListener">
      >> <Arg>
      >> <New class="org.mortbay.http.SocketListener">
      >> <Set name="MinThreads">8</Set>
      >> <Set name="MaxThreads">16</Set>
      >> ...
      >> </New>
      >> </Arg>
      >> </Call>
      >>
      >>
      >> My best guess is that the SQLTransformer will wait indefinitely (causing
      >> Cocoon to hang) if it fails to get a database connection from the pool.
      >>
      >> The SQLTransformer further appears to require more than one database
      >> connection (...which seems to be related to the depth of query nesting levels,
      >> but that's a far-fetched guess!)
      >>
      >> Has someone out there experienced similar problems, or have a more
      >> enlightening explanation to what is going on?
      >>
      >> As my fix is based on a lot of guesswork, I would welcome a more robust solution.
      >>
      >>
      >> Best regards,
      >> Fabrizio
      >
      > It sounds like SQLTransformer naively gets a new connection per query.
      > This in turn requires that if your page has 4 nested queries then
      > SQLTransformer is using 4 of your 50 connections. It only takes 13
      > simultaneous requests to use up all the connections at that rate, and if
      > all your requests have anywhere between 1 and 3 connections reserved and
      > are not finished yet, none of the pages will release the connection and
      > a deadlock occurs. If you have as many as 10 nested queries it is
      > obvious to see how something like this could occur.
      >
      > I think the ESQL logicsheet suffered from this many moons ago, but
      > resolved it by requesting one connection per page. BTW, Microsoft's
      > feeble driver is limited to one connection per query--so your choice is
      > to use jTDS or some other company's version.
      >
      > Nonetheless, it seems like there is a bug in SQLTransformer. Submit a
      > bug report, and hopefully it can get resolved.
      >
      > --
      >
      > Design is a funny word. Some people think design means how it looks.
      > But of course, if you dig deeper, it's really how it works.
      >
      > -- Steve Jobs

      Attachments

        Activity

          People

            Unassigned Unassigned
            fabrizio.sitzia@chem.lu Fabrizio Sitzia
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: