Cocoon
  1. Cocoon
  2. COCOON-1985

AbstractCachingProcessingPipeline locking with IncludeTransformer may hang pipeline

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Critical Critical
    • Resolution: Fixed
    • Affects Version/s: 2.1.9, 2.1.10, 2.1.11, 2.2
    • Fix Version/s: 2.1.12, 2.2
    • Component/s: * Cocoon Core
    • Labels:
      None
    • Other Info:
      Patch available
    • Affects version (Component):
      Cocoon Core - 2.2.0-RC2
    • Fix version (Component):
      Cocoon Core - 2.2.0-RC3-dev

      Description

      Cocoon 2.1.9 introduced the concept of a lock in AbstractCachingProcessingPipeline, an optimization to prevent two concurrent requests from generating the same cached content. The first request adds the pipeline key to the transient cache to 'lock' the cache entry for that pipeline, subsequent concurrent requests wait for the first request to cache the content (by Object.lock()ing the pipeline key entry) before proceeding, and can then use the newly cached content.

      However, this has introduced an incompatibility with the IncludeTransformer: if the inclusions access the same yet-to-be-cached content as the root pipeline, the whole assembly hangs, since a lock will be made on a lock already held by the same thread, and which cannot be satisfied.

      e.g.
      i) Root pipeline generates using sub-pipeline cocoon:/foo.xml
      ii) the cocoon:/foo.xml sub-pipeline adds it's pipeline key to the transient store as a lock.
      iii) subsequently in the root pipeline, the IncludeTransformer is run.
      iv) one of the inclusions also generates with cocoon:/foo.xml, this sub-pipeline locks in AbstractProcessingPipeline.waitForLock() because the sub-pipeline key is already present.
      v) deadlock.

      I've found a (partial, see below) solution for this: instead of a plain Object being added to the transient store as the lock object, the Thread.currentThread() is added; when waitForLock() is called, if the lock object exists, it checks that it is not the same thread before attempting to lock it; if it is the same thread, then waitForLock() returns success, which allows generation to proceed. You loose the efficiency of generating the cache only once in this case, but at least it doesn't hang! With JDK1.5 this can be made neater by using Thread#holdsLock() instead of adding the thread object itself to the transient store.

      See patch file.

      However, even with this fix, parallel includes (when enabled) may still hang, because they pass the not-the-same-thread test, but fail because the root pipeline, which holds the initial lock, cannot complete (and therefore statisfy the lock condition for the parallel threads), before the threads themselves have completed, which then results in a deadlock again.

      The complete solution is probably to avoid locking if the lock is held by the same top-level Request, but that requires more knowledge of Cocoon's processing than I (currently) have!

      IMHO unless a complete solution is found to this, then this optimization should be removed completely, or else made optional by configuration, since it renders the IncludeTransformer dangerous.
      1. sitemap.xmap
        1 kB
        Ellis Pritchard
      2. reproduceMultipleThreads.tar.gz
        2 kB
        Alexander Daniel
      3. patch.txt
        1.0 kB
        Ellis Pritchard
      4. includer.xsl
        0.3 kB
        Ellis Pritchard
      5. caching-trials.patch
        4 kB
        Alexander Klimetschek

        Issue Links

          Activity

          Hide
          Ellis Pritchard added a comment -
          The patch file may not be directly applicable (wrong root?), but the change is should simple enough (2 lines) to be understood anyway! :)
          Show
          Ellis Pritchard added a comment - The patch file may not be directly applicable (wrong root?), but the change is should simple enough (2 lines) to be understood anyway! :)
          Hide
          Ellis Pritchard added a comment -
          Ok, attached are a very simple sitemap and XSL stylesheet which demonstrates the lock up with the unpatched code.

          Stack-trace:
          Thread [http-41501-Processor25] (Suspended)
          owns: ToXMLSAXHandler (id=84)
          waiting for: Object (id=83)
          Object.wait(long) line: not available [native method]
          Object.wait() line: 429
          CachingProcessingPipeline(AbstractCachingProcessingPipeline).waitForLock(Object) line: 187
          CachingProcessingPipeline(AbstractCachingProcessingPipeline).validatePipeline(Environment) line: 684
          CachingProcessingPipeline(AbstractCachingProcessingPipeline).setupPipeline(Environment) line: 724
          CachingProcessingPipeline(AbstractProcessingPipeline).preparePipeline(Environment) line: 501
          CachingProcessingPipeline(AbstractProcessingPipeline).prepareInternal(Environment) line: 515
          SitemapSource.init() line: 342
          SitemapSource.<init>(ComponentManager, String, Map, Logger) line: 214
          SitemapSourceFactory.getSource(String, Map) line: 65
          SourceResolverImpl.resolveURI(String, String, Map) line: 208
          CocoonComponentManager.resolveURI(String, String, Map) line: 558
          HttpEnvironment(AbstractEnvironment).resolveURI(String, String, Map) line: 553
          HttpEnvironment(AbstractEnvironment).resolveURI(String) line: 540
          IncludeTransformer$IncludeElement.process0(ContentHandler, LexicalHandler) line: 560
          IncludeTransformer$IncludeElement.process(ContentHandler, LexicalHandler) line: 538
          IncludeTransformer$IncludeXMLPipe.endElement(String, String, String) line: 916
          IncludeTransformer(AbstractXMLPipe).endElement(String, String, String) line: 112
          ToXMLSAXHandler.endElement(String, String, String) line: 261
          ElemLiteralResult.execute(TransformerImpl) line: 1399
          TransformerImpl.executeChildTemplates(ElemTemplateElement, boolean) line: 2411
          ElemLiteralResult.execute(TransformerImpl) line: 1374
          TransformerImpl.executeChildTemplates(ElemTemplateElement, boolean) line: 2411
          TransformerImpl.applyTemplateToNode(ElemTemplateElement, ElemTemplate, int) line: 2281
          TransformerImpl.transformNode(int) line: 1367
          TransformerImpl.run() line: 3458
          TransformerHandlerImpl.endDocument() line: 406
          TraxTransformer(AbstractXMLPipe).endDocument() line: 56
          TraxTransformer.endDocument() line: 586
          EnvironmentChanger.endDocument() line: 119
          XMLTeePipe.endDocument() line: 71
          SAXParserImpl$JAXPSAXParser(AbstractSAXParser).endDocument(Augmentations) line: not available
          XMLNSDocumentScannerImpl(XMLDocumentScannerImpl).endEntity(String, Augmentations) line: not available
          XMLEntityManager.endEntity() line: not available
          XMLEntityScanner.load(int, boolean) line: not available
          XMLEntityScanner.skipSpaces() line: not available
          XMLDocumentScannerImpl$TrailingMiscDispatcher.dispatch(boolean) line: not available
          XMLNSDocumentScannerImpl(XMLDocumentFragmentScannerImpl).scanDocument(boolean) line: not available
          XML11Configuration.parse(boolean) line: not available
          XML11Configuration.parse(XMLInputSource) line: not available
          SAXParserImpl$JAXPSAXParser(XMLParser).parse(XMLInputSource) line: not available
          SAXParserImpl$JAXPSAXParser(AbstractSAXParser).parse(InputSource) line: not available
          SAXParserImpl$JAXPSAXParser.parse(InputSource) line: not available
          JaxpParser.parse(InputSource, ContentHandler, LexicalHandler) line: 315
          JaxpParser.parse(InputSource, ContentHandler) line: 334
          SourceUtil.parse(ServiceManager, Source, ContentHandler) line: 326
          FileGenerator.generate() line: 116
          CachingProcessingPipeline(AbstractCachingProcessingPipeline).processXMLPipeline(Environment) line: 333
          CachingProcessingPipeline(AbstractProcessingPipeline).process(Environment, XMLConsumer) line: 780
          SitemapSource.toSAX(ContentHandler) line: 413
          SourceUtil.toSAX(XMLizable, ContentHandler) line: 101
          SourceUtil.parse(ServiceManager, Source, ContentHandler) line: 321
          FileGenerator.generate() line: 116
          CachingProcessingPipeline(AbstractCachingProcessingPipeline).processXMLPipeline(Environment) line: 367
          CachingProcessingPipeline(AbstractProcessingPipeline).process(Environment) line: 481
          SerializeNode.invoke(Environment, InvokeContext) line: 121
          MatchNode(AbstractParentProcessingNode).invokeNodes(ProcessingNode[], Environment, InvokeContext, String, Map) line: 47
          MatchNode.invoke(Environment, InvokeContext) line: 108
          PipelineNode(AbstractParentProcessingNode).invokeNodes(ProcessingNode[], Environment, InvokeContext) line: 69
          PipelineNode.invoke(Environment, InvokeContext) line: 143
          PipelinesNode(AbstractParentProcessingNode).invokeNodes(ProcessingNode[], Environment, InvokeContext) line: 69
          PipelinesNode.invoke(Environment, InvokeContext) line: 93
          ConcreteTreeProcessor.process(Environment, InvokeContext) line: 235
          ConcreteTreeProcessor.process(Environment) line: 177
          TreeProcessor.process(Environment) line: 253
          MountNode.invoke(Environment, InvokeContext) line: 118
          MatchNode(AbstractParentProcessingNode).invokeNodes(ProcessingNode[], Environment, InvokeContext, String, Map) line: 47
          MatchNode.invoke(Environment, InvokeContext) line: 108
          PipelineNode(AbstractParentProcessingNode).invokeNodes(ProcessingNode[], Environment, InvokeContext) line: 69
          PipelineNode.invoke(Environment, InvokeContext) line: 143
          PipelinesNode(AbstractParentProcessingNode).invokeNodes(ProcessingNode[], Environment, InvokeContext) line: 69
          PipelinesNode.invoke(Environment, InvokeContext) line: 93
          ConcreteTreeProcessor.process(Environment, InvokeContext) line: 235
          ConcreteTreeProcessor.process(Environment) line: 177
          TreeProcessor.process(Environment) line: 253
          Cocoon.process(Environment) line: 699
          WispersCocoonServlet(CocoonServlet).service(HttpServletRequest, HttpServletResponse) line: 1155
          WispersCocoonServlet(HttpServlet).service(ServletRequest, ServletResponse) line: 853
          ApplicationFilterChain.internalDoFilter(ServletRequest, ServletResponse) line: 237
          ApplicationFilterChain.doFilter(ServletRequest, ServletResponse) line: 157
          StandardWrapperValve.invoke(Request, Response, ValveContext) line: 214
          StandardValveContext.invokeNext(Request, Response) line: 104
          StandardPipeline.invoke(Request, Response) line: 520
          StandardContextValve.invokeInternal(Wrapper, Request, Response) line: 198
          StandardContextValve.invoke(Request, Response, ValveContext) line: 144
          StandardValveContext.invokeNext(Request, Response) line: 104
          AccessLogValve.invoke(Request, Response, ValveContext) line: 535
          StandardValveContext.invokeNext(Request, Response) line: 102
          StandardPipeline.invoke(Request, Response) line: 520
          StandardHostValve.invoke(Request, Response, ValveContext) line: 137
          StandardValveContext.invokeNext(Request, Response) line: 104
          ErrorReportValve.invoke(Request, Response, ValveContext) line: 117
          StandardValveContext.invokeNext(Request, Response) line: 102
          StandardPipeline.invoke(Request, Response) line: 520
          StandardEngineValve.invoke(Request, Response, ValveContext) line: 109
          StandardValveContext.invokeNext(Request, Response) line: 104
          StandardPipeline.invoke(Request, Response) line: 520
          StandardEngine(ContainerBase).invoke(Request, Response) line: 929
          CoyoteAdapter.service(Request, Response) line: 160
          Http11Processor.process(InputStream, OutputStream) line: 799
          Http11Protocol$Http11ConnectionHandler.processConnection(TcpConnection, Object[]) line: 705
          TcpWorkerThread.runIt(Object[]) line: 577
          ThreadPool$ControlRunnable.run() line: 683
          ThreadWithAttributes(Thread).run() line: 534
          Show
          Ellis Pritchard added a comment - Ok, attached are a very simple sitemap and XSL stylesheet which demonstrates the lock up with the unpatched code. Stack-trace: Thread [http-41501-Processor25] (Suspended) owns: ToXMLSAXHandler (id=84) waiting for: Object (id=83) Object.wait(long) line: not available [native method] Object.wait() line: 429 CachingProcessingPipeline(AbstractCachingProcessingPipeline).waitForLock(Object) line: 187 CachingProcessingPipeline(AbstractCachingProcessingPipeline).validatePipeline(Environment) line: 684 CachingProcessingPipeline(AbstractCachingProcessingPipeline).setupPipeline(Environment) line: 724 CachingProcessingPipeline(AbstractProcessingPipeline).preparePipeline(Environment) line: 501 CachingProcessingPipeline(AbstractProcessingPipeline).prepareInternal(Environment) line: 515 SitemapSource.init() line: 342 SitemapSource.<init>(ComponentManager, String, Map, Logger) line: 214 SitemapSourceFactory.getSource(String, Map) line: 65 SourceResolverImpl.resolveURI(String, String, Map) line: 208 CocoonComponentManager.resolveURI(String, String, Map) line: 558 HttpEnvironment(AbstractEnvironment).resolveURI(String, String, Map) line: 553 HttpEnvironment(AbstractEnvironment).resolveURI(String) line: 540 IncludeTransformer$IncludeElement.process0(ContentHandler, LexicalHandler) line: 560 IncludeTransformer$IncludeElement.process(ContentHandler, LexicalHandler) line: 538 IncludeTransformer$IncludeXMLPipe.endElement(String, String, String) line: 916 IncludeTransformer(AbstractXMLPipe).endElement(String, String, String) line: 112 ToXMLSAXHandler.endElement(String, String, String) line: 261 ElemLiteralResult.execute(TransformerImpl) line: 1399 TransformerImpl.executeChildTemplates(ElemTemplateElement, boolean) line: 2411 ElemLiteralResult.execute(TransformerImpl) line: 1374 TransformerImpl.executeChildTemplates(ElemTemplateElement, boolean) line: 2411 TransformerImpl.applyTemplateToNode(ElemTemplateElement, ElemTemplate, int) line: 2281 TransformerImpl.transformNode(int) line: 1367 TransformerImpl.run() line: 3458 TransformerHandlerImpl.endDocument() line: 406 TraxTransformer(AbstractXMLPipe).endDocument() line: 56 TraxTransformer.endDocument() line: 586 EnvironmentChanger.endDocument() line: 119 XMLTeePipe.endDocument() line: 71 SAXParserImpl$JAXPSAXParser(AbstractSAXParser).endDocument(Augmentations) line: not available XMLNSDocumentScannerImpl(XMLDocumentScannerImpl).endEntity(String, Augmentations) line: not available XMLEntityManager.endEntity() line: not available XMLEntityScanner.load(int, boolean) line: not available XMLEntityScanner.skipSpaces() line: not available XMLDocumentScannerImpl$TrailingMiscDispatcher.dispatch(boolean) line: not available XMLNSDocumentScannerImpl(XMLDocumentFragmentScannerImpl).scanDocument(boolean) line: not available XML11Configuration.parse(boolean) line: not available XML11Configuration.parse(XMLInputSource) line: not available SAXParserImpl$JAXPSAXParser(XMLParser).parse(XMLInputSource) line: not available SAXParserImpl$JAXPSAXParser(AbstractSAXParser).parse(InputSource) line: not available SAXParserImpl$JAXPSAXParser.parse(InputSource) line: not available JaxpParser.parse(InputSource, ContentHandler, LexicalHandler) line: 315 JaxpParser.parse(InputSource, ContentHandler) line: 334 SourceUtil.parse(ServiceManager, Source, ContentHandler) line: 326 FileGenerator.generate() line: 116 CachingProcessingPipeline(AbstractCachingProcessingPipeline).processXMLPipeline(Environment) line: 333 CachingProcessingPipeline(AbstractProcessingPipeline).process(Environment, XMLConsumer) line: 780 SitemapSource.toSAX(ContentHandler) line: 413 SourceUtil.toSAX(XMLizable, ContentHandler) line: 101 SourceUtil.parse(ServiceManager, Source, ContentHandler) line: 321 FileGenerator.generate() line: 116 CachingProcessingPipeline(AbstractCachingProcessingPipeline).processXMLPipeline(Environment) line: 367 CachingProcessingPipeline(AbstractProcessingPipeline).process(Environment) line: 481 SerializeNode.invoke(Environment, InvokeContext) line: 121 MatchNode(AbstractParentProcessingNode).invokeNodes(ProcessingNode[], Environment, InvokeContext, String, Map) line: 47 MatchNode.invoke(Environment, InvokeContext) line: 108 PipelineNode(AbstractParentProcessingNode).invokeNodes(ProcessingNode[], Environment, InvokeContext) line: 69 PipelineNode.invoke(Environment, InvokeContext) line: 143 PipelinesNode(AbstractParentProcessingNode).invokeNodes(ProcessingNode[], Environment, InvokeContext) line: 69 PipelinesNode.invoke(Environment, InvokeContext) line: 93 ConcreteTreeProcessor.process(Environment, InvokeContext) line: 235 ConcreteTreeProcessor.process(Environment) line: 177 TreeProcessor.process(Environment) line: 253 MountNode.invoke(Environment, InvokeContext) line: 118 MatchNode(AbstractParentProcessingNode).invokeNodes(ProcessingNode[], Environment, InvokeContext, String, Map) line: 47 MatchNode.invoke(Environment, InvokeContext) line: 108 PipelineNode(AbstractParentProcessingNode).invokeNodes(ProcessingNode[], Environment, InvokeContext) line: 69 PipelineNode.invoke(Environment, InvokeContext) line: 143 PipelinesNode(AbstractParentProcessingNode).invokeNodes(ProcessingNode[], Environment, InvokeContext) line: 69 PipelinesNode.invoke(Environment, InvokeContext) line: 93 ConcreteTreeProcessor.process(Environment, InvokeContext) line: 235 ConcreteTreeProcessor.process(Environment) line: 177 TreeProcessor.process(Environment) line: 253 Cocoon.process(Environment) line: 699 WispersCocoonServlet(CocoonServlet).service(HttpServletRequest, HttpServletResponse) line: 1155 WispersCocoonServlet(HttpServlet).service(ServletRequest, ServletResponse) line: 853 ApplicationFilterChain.internalDoFilter(ServletRequest, ServletResponse) line: 237 ApplicationFilterChain.doFilter(ServletRequest, ServletResponse) line: 157 StandardWrapperValve.invoke(Request, Response, ValveContext) line: 214 StandardValveContext.invokeNext(Request, Response) line: 104 StandardPipeline.invoke(Request, Response) line: 520 StandardContextValve.invokeInternal(Wrapper, Request, Response) line: 198 StandardContextValve.invoke(Request, Response, ValveContext) line: 144 StandardValveContext.invokeNext(Request, Response) line: 104 AccessLogValve.invoke(Request, Response, ValveContext) line: 535 StandardValveContext.invokeNext(Request, Response) line: 102 StandardPipeline.invoke(Request, Response) line: 520 StandardHostValve.invoke(Request, Response, ValveContext) line: 137 StandardValveContext.invokeNext(Request, Response) line: 104 ErrorReportValve.invoke(Request, Response, ValveContext) line: 117 StandardValveContext.invokeNext(Request, Response) line: 102 StandardPipeline.invoke(Request, Response) line: 520 StandardEngineValve.invoke(Request, Response, ValveContext) line: 109 StandardValveContext.invokeNext(Request, Response) line: 104 StandardPipeline.invoke(Request, Response) line: 520 StandardEngine(ContainerBase).invoke(Request, Response) line: 929 CoyoteAdapter.service(Request, Response) line: 160 Http11Processor.process(InputStream, OutputStream) line: 799 Http11Protocol$Http11ConnectionHandler.processConnection(TcpConnection, Object[]) line: 705 TcpWorkerThread.runIt(Object[]) line: 577 ThreadPool$ControlRunnable.run() line: 683 ThreadWithAttributes(Thread).run() line: 534
          Hide
          Alfred Nathaniel added a comment -
          A Java thread cannot deadlock with itself. Which is the other thread which holds the lock, and what is it waiting for?
          Show
          Alfred Nathaniel added a comment - A Java thread cannot deadlock with itself. Which is the other thread which holds the lock, and what is it waiting for?
          Hide
          Alfred Nathaniel added a comment -
          Oops, deadlock in lock.wait is possible, of course.

          Patch applied. Thanks for the detailed analysis.
          Show
          Alfred Nathaniel added a comment - Oops, deadlock in lock.wait is possible, of course. Patch applied. Thanks for the detailed analysis.
          Hide
          Ellis Pritchard added a comment -
          Cheers ;)

          However, as I alluded to in the original report, my patch is *not* a full solution to this problem.

          I believe deadlock (this time with the main thread and sub-threads) will happen if the same test is run with the IncludeTransformer in parallel-processing mode. The reason is that the main thread cannot complete until the include threads have completed, and they can't complete because the main thread holds the pipeline lock.

          This should be easy to test by adding:

             <map:parameter name="parallel" value="true"/>

          to the sitemap inside the <map:transformer type="include"/>.

          It seems like my semi-patch fall-through logic should be modified to compare the thread's main Request against the Request of the pipeline holder.

          Maybe the original author of the code can help?
          Show
          Ellis Pritchard added a comment - Cheers ;) However, as I alluded to in the original report, my patch is *not* a full solution to this problem. I believe deadlock (this time with the main thread and sub-threads) will happen if the same test is run with the IncludeTransformer in parallel-processing mode. The reason is that the main thread cannot complete until the include threads have completed, and they can't complete because the main thread holds the pipeline lock. This should be easy to test by adding:    <map:parameter name="parallel" value="true"/> to the sitemap inside the <map:transformer type="include"/>. It seems like my semi-patch fall-through logic should be modified to compare the thread's main Request against the Request of the pipeline holder. Maybe the original author of the code can help?
          Hide
          Alexander Klimetschek added a comment -
          Hi, I experienced the same problem with a Cocoon 2.2 trunk version from december and applied the patch locally and it helps!

          My situation is a bit different: I have two different views on a page (html and pdf) which are selected by the http-accept header (REST style). Both use an xml returned by a another block via the block:protocol but then style with their own XSL sheets. There is no include transformer involved. The problem appeared when I clicked on the html page and then on the pdf version; once both loaded initially, requesting either of them again hang. When debugging, I found that the thread hang at

          synchronized(lock) {
                                  lock.wait();
                              }

          in AbstractCachingProcessingPipeline.java waitForLock(). I applied the short patch and I can no longer reproduce the problem.
          Show
          Alexander Klimetschek added a comment - Hi, I experienced the same problem with a Cocoon 2.2 trunk version from december and applied the patch locally and it helps! My situation is a bit different: I have two different views on a page (html and pdf) which are selected by the http-accept header (REST style). Both use an xml returned by a another block via the block:protocol but then style with their own XSL sheets. There is no include transformer involved. The problem appeared when I clicked on the html page and then on the pdf version; once both loaded initially, requesting either of them again hang. When debugging, I found that the thread hang at synchronized(lock) {                         lock.wait();                     } in AbstractCachingProcessingPipeline.java waitForLock(). I applied the short patch and I can no longer reproduce the problem.
          Hide
          Ellis Pritchard added a comment -
          Added 2.2 to Affects/Fixes Versions
          Show
          Ellis Pritchard added a comment - Added 2.2 to Affects/Fixes Versions
          Show
          Ellis Pritchard added a comment - A quick check reveals that this has been patched in trunk (2.2): http://svn.apache.org/viewvc/cocoon/trunk/core/cocoon-pipeline/cocoon-pipeline-impl/src/main/java/org/apache/cocoon/components/pipeline/impl/AbstractCachingProcessingPipeline.java?r1=490536&r2=498471&diff_format=h
          Hide
          Sascha-Matthias Kulawik added a comment -
          I think the bug is still not fixed if parallels = true is set. (As written by Ellis Pritchard). Is the only solution to avoid parallels when reusing the same processing pipeline?
          I think the best solution would be to use the request as mentioned by Ellis for locking.
          Show
          Sascha-Matthias Kulawik added a comment - I think the bug is still not fixed if parallels = true is set. (As written by Ellis Pritchard). Is the only solution to avoid parallels when reusing the same processing pipeline? I think the best solution would be to use the request as mentioned by Ellis for locking.
          Hide
          Alexander Klimetschek added a comment -
          Now the problem returned to us... I cannot narrow down the sitemap constructs leading to the error, but it is related to AbstractCachingProcessingPipeline. Interesting to note that we do *not* use the IncludeTransformer in parallel processing mode.

          I tried the suggestion of Ellis to use the current Request object as lock, but this does not help, requests hang the same way as before.
          Show
          Alexander Klimetschek added a comment - Now the problem returned to us... I cannot narrow down the sitemap constructs leading to the error, but it is related to AbstractCachingProcessingPipeline. Interesting to note that we do *not* use the IncludeTransformer in parallel processing mode. I tried the suggestion of Ellis to use the current Request object as lock, but this does not help, requests hang the same way as before.
          Hide
          Alexander Klimetschek added a comment -
          A patch for the discussion, not a final solution.... I made the waitForLock() fuzzy, so that it returns after 250 ms of waiting (via lock.wait(250)) and tries the waiting 2 times - if that does not work, it will accept the fact that there is no cached response available. This avoids the deadlock, but will slow down some requests.

          In my (special !?) situation this leads to an oscillating run: one request will be answered quickly from the cache, the next request will hit the wait problem, run 2 times through the lock (~500ms) and then rebuild the stuff from the cache. So quick - slow - quick -slow.... The setup here is only one single request! Nothing in parallel.

          The sitemap is quite simple, the problem appears when requesting foobar/ with REST=xml:

            <match pattern="*/">
          <act type="authorise" src="jcr:///teamspaces/{1}/tasklist"/>
          <select type="REST">
          <when test="html">
          <read
          src="cocoon:/internal/pipe/tasks/{1}/list.html" />
          </when>
          <when test="xml">
          <read
          src="cocoon:/internal/pipe/tasks/{1}/list.xml" mime-type="text/xml"/>
          </when>
          </select>
            </match>

            <match pattern="internal/pipe/tasks/*/list.xml">
          <generate type="collection"
          src="jcr:///teamspaces/{1}/tasks/">
          <parameter name="include" value=".*\.xml" />
          </generate>
          <transform src="xslt/xml/collection2tasklist.xsl">
          <parameter name="basePath" value="{system-property:mindquarry.server.url}{request:contextPath}{request:servletPath}/" />
          <parameter name="teamspace" value="{1}" />
          </transform>
          <serialize type="xml" />
            </match>
          Show
          Alexander Klimetschek added a comment - A patch for the discussion, not a final solution.... I made the waitForLock() fuzzy, so that it returns after 250 ms of waiting (via lock.wait(250)) and tries the waiting 2 times - if that does not work, it will accept the fact that there is no cached response available. This avoids the deadlock, but will slow down some requests. In my (special !?) situation this leads to an oscillating run: one request will be answered quickly from the cache, the next request will hit the wait problem, run 2 times through the lock (~500ms) and then rebuild the stuff from the cache. So quick - slow - quick -slow.... The setup here is only one single request! Nothing in parallel. The sitemap is quite simple, the problem appears when requesting foobar/ with REST=xml:   <match pattern="*/"> <act type="authorise" src=" jcr:///teamspaces/ {1}/tasklist"/> <select type="REST"> <when test="html"> <read src="cocoon:/internal/pipe/tasks/{1}/list.html" /> </when> <when test="xml"> <read src="cocoon:/internal/pipe/tasks/{1}/list.xml" mime-type="text/xml"/> </when> </select>   </match>   <match pattern="internal/pipe/tasks/*/list.xml"> <generate type="collection" src=" jcr:///teamspaces/ {1}/tasks/"> <parameter name="include" value=".*\.xml" /> </generate> <transform src="xslt/xml/collection2tasklist.xsl"> <parameter name="basePath" value="{system-property:mindquarry.server.url}{request:contextPath}{request:servletPath}/" /> <parameter name="teamspace" value="{1}" /> </transform> <serialize type="xml" />   </match>
          Hide
          Vadim Gritsenko added a comment -
          Fixed in 2.2 (trunk)
          Show
          Vadim Gritsenko added a comment - Fixed in 2.2 (trunk)
          Hide
          Grzegorz Kossakowski added a comment -
          Is it still a problem in 2.1.x branch? Why fix from trunk was not applied to 2.1.x branch?
          Show
          Grzegorz Kossakowski added a comment - Is it still a problem in 2.1.x branch? Why fix from trunk was not applied to 2.1.x branch?
          Hide
          Alexander Daniel added a comment -
          From the source it looks like it is fixed in 2.1.11.

           54: * @version $Id: AbstractCachingProcessingPipeline.java 498470 2007-01-21 22:34:30Z anathaniel $
          ...
          183: // Avoid deadlock with self (see JIRA COCOON-1985).
          184: if(lock != null && lock != Thread.currentThread()) {
          185: try {
          186: // become owner of monitor
          187: synchronized(lock) {
          188: lock.wait();
          189: }
          190: } catch (InterruptedException e) {
          191: if(getLogger().isDebugEnabled()) {
          192: getLogger().debug("Got interrupted waiting for other pipeline to finish processing, retrying...",e);
          193: }
          194: return false;
          195: }
          196: if(getLogger().isDebugEnabled()) {
          197: getLogger().debug("Other pipeline finished processing, retrying to get cached response.");
          198: }
          199: return false;
          200: }
          Show
          Alexander Daniel added a comment - From the source it looks like it is fixed in 2.1.11.  54: * @version $Id: AbstractCachingProcessingPipeline.java 498470 2007-01-21 22:34:30Z anathaniel $ ... 183: // Avoid deadlock with self (see JIRA COCOON-1985 ). 184: if(lock != null && lock != Thread.currentThread()) { 185: try { 186: // become owner of monitor 187: synchronized(lock) { 188: lock.wait(); 189: } 190: } catch (InterruptedException e) { 191: if(getLogger().isDebugEnabled()) { 192: getLogger().debug("Got interrupted waiting for other pipeline to finish processing, retrying...",e); 193: } 194: return false; 195: } 196: if(getLogger().isDebugEnabled()) { 197: getLogger().debug("Other pipeline finished processing, retrying to get cached response."); 198: } 199: return false; 200: }
          Hide
          Ellis Pritchard added a comment -
          I'm still not happy with this.

          1/ The original patch for 2.1.x had a problem (as I explained) where parallel includes can dead-lock; this is neatly by-passed in 2.2.x, in a way that is not possible in 2.1.x (see http://svn.apache.org/viewvc?view=rev&revision=601871). However...

          2/ I'm convinced that this code could still dead-lock; the waiter relies on being woken by lock.notifyAll(), however, this notification could conceivably happen between getting the lock object from the transient store, and actually doing a lock.wait() to wait to be notified, thus the waiter would not be woken by the thread that holds the lock:

              protected boolean waitForLock(Object key) {
                  if(transientStore != null) {
                      Object lock = null;
                      synchronized(transientStore) {
                          String lockKey = PIPELOCK_PREFIX+key;
                          if(transientStore.containsKey(lockKey)) {
                              // cache content is currently being generated, wait for other thread
                              lock = transientStore.get(lockKey);
                          }
                      }
                      /***
                       *** BIG SCARY UN-SYNCHRONIZED GAP
                       *** lock.notifyAll() could be called here, and we'd not hear about it.
                       ***/
                      // Avoid deadlock with self (see JIRA COCOON-1985).
                      if(lock != null && lock != Thread.currentThread()) {
                          try {
                              // become owner of monitor
                              synchronized(lock) {
                                  /*** WAIT UNTIL OTHER THREAD CALLS notifyAll() again ***/
                                  lock.wait();
                              }
                          } catch (InterruptedException e) {
                              if(getLogger().isDebugEnabled()) {
                                  getLogger().debug("Got interrupted waiting for other pipeline to finish processing, retrying...",e);
                              }
                              return false;
                          }
                          if(getLogger().isDebugEnabled()) {
                              getLogger().debug("Other pipeline finished processing, retrying to get cached response.");
                          }
                          return false;
                      }
                  }
                  return true;
              }


          Fortunately, because I used the Thread object as the lock object in my patch for 2.1.x, the unfortunate waiting thread should only (!) have to wait until the processor thread that set the lock is re-used for generating cacheable content, (and runs notifyAll() in releaseLock()), but that could still be a significant period. However, with the new 2.2 patch, the RequestAttributes object is never going to be re-used, so will never be notified, so we are in a complete dead-lock again.

          A work-around (i.e. hack) for this is, instead of using Object#wait(), use Object#wait(long timeout), i.e. replace

                                  lock.wait();
          above, with
                                  lock.wait(1000);

          i.e. wait for notification, or 1000ms (pick a number); the calling code loops calling waitForLock() so exiting without being notified is safe, however, it's still a hack.

          Basically, the whole idea behind pipe-line locking is a great one, but this implementation is full of danger, and, worse, the sort of danger you only encounter on a very busy system. My original recommendation was to remove this code; I still stand by that.

          Show
          Ellis Pritchard added a comment - I'm still not happy with this. 1/ The original patch for 2.1.x had a problem (as I explained) where parallel includes can dead-lock; this is neatly by-passed in 2.2.x, in a way that is not possible in 2.1.x (see http://svn.apache.org/viewvc?view=rev&revision=601871) . However... 2/ I'm convinced that this code could still dead-lock; the waiter relies on being woken by lock.notifyAll(), however, this notification could conceivably happen between getting the lock object from the transient store, and actually doing a lock.wait() to wait to be notified, thus the waiter would not be woken by the thread that holds the lock:     protected boolean waitForLock(Object key) {         if(transientStore != null) {             Object lock = null;             synchronized(transientStore) {                 String lockKey = PIPELOCK_PREFIX+key;                 if(transientStore.containsKey(lockKey)) {                     // cache content is currently being generated, wait for other thread                     lock = transientStore.get(lockKey);                 }             }             /***              *** BIG SCARY UN-SYNCHRONIZED GAP              *** lock.notifyAll() could be called here, and we'd not hear about it.              ***/             // Avoid deadlock with self (see JIRA COCOON-1985 ).             if(lock != null && lock != Thread.currentThread()) {                 try {                     // become owner of monitor                     synchronized(lock) {                         /*** WAIT UNTIL OTHER THREAD CALLS notifyAll() again ***/                         lock.wait();                     }                 } catch (InterruptedException e) {                     if(getLogger().isDebugEnabled()) {                         getLogger().debug("Got interrupted waiting for other pipeline to finish processing, retrying...",e);                     }                     return false;                 }                 if(getLogger().isDebugEnabled()) {                     getLogger().debug("Other pipeline finished processing, retrying to get cached response.");                 }                 return false;             }         }         return true;     } Fortunately, because I used the Thread object as the lock object in my patch for 2.1.x, the unfortunate waiting thread should only (!) have to wait until the processor thread that set the lock is re-used for generating cacheable content, (and runs notifyAll() in releaseLock()), but that could still be a significant period. However, with the new 2.2 patch, the RequestAttributes object is never going to be re-used, so will never be notified, so we are in a complete dead-lock again. A work-around (i.e. hack) for this is, instead of using Object#wait(), use Object#wait(long timeout), i.e. replace                         lock.wait(); above, with                         lock.wait(1000); i.e. wait for notification, or 1000ms (pick a number); the calling code loops calling waitForLock() so exiting without being notified is safe, however, it's still a hack. Basically, the whole idea behind pipe-line locking is a great one, but this implementation is full of danger, and, worse, the sort of danger you only encounter on a very busy system. My original recommendation was to remove this code; I still stand by that.
          Hide
          Alexander Daniel added a comment -
          Two requests can deadlock each other in Cocoon 2.1.11 (without use of parallel with include transformer):
          * request A: generating lock for 55933
          * request B: generating lock for 58840
          * request B: waiting for lock 55933 which is hold by request A
          * request A: waiting for lock 58840 which is hold by request B


          I can reproduce this behaviour with Apache Bench and following pipeline:
          * terminal 1: Apache Bench request A (ab -k -n 10000 -c 25 http://localhost:8888/samples/reproduceMultipleThreads/productOfferForDevice/55933/)
          * terminal 2: Apache Bench request B (ab -k -n 10000 -c 25 http://localhost:8888/samples/reproduceMultipleThreads/productOfferForDevice/58840/)
          * terminal 3: touching the two data files every second to invalidate the cache (while true; do echo -n "."; touch 55933.xml 58840.xml; sleep 1; done)

          * pipeline:
          <map:pipeline type="caching">
            <map:match pattern="productOfferForDevice*/*/">
              <map:generate src="cocoon:/exists/{2}.xml" label="a"/>
              <map:transform type="xsltc" src="productOfferIncludeDevice.xsl" label="b">
                  <map:parameter name="noInc" value="{1}"/>
              </map:transform>
              <map:transform type="include" label="c"/>
              <map:serialize type="xml"/>
          </map:match>

            <map:match pattern="exists/**">
              <map:act type="resource-exists">
                  <map:parameter name="url" value="{1}" />
                  <map:generate src="{../1}" />
                  <map:serialize type="xml" />
              </map:act>
              <!-- not found -->
              <map:generate src="dummy.xml" />
              <map:serialize type="xml" />
            </map:match>
          </map:pipeline>


          After some seconds the deadlock occurs ==>
          * Apache Bench requests run into a timeout

          * I can see following pipe locks in the default transient store:
          PIPELOCK:PK_G-file-cocoon://samples/reproduceMultipleThreads/exists/55933.xml?pipelinehash=-910770960103935149_T-xsltc-file:///Users/alex/dev/cocoon/cocoon-2.1.11/build/webapp/samples/reproduceMultipleThreads/productOfferIncludeDevice.xsl;noInc=_T-include-I_S-xml-1 (class: org.mortbay.util.ThreadPool$PoolThread)
          PIPELOCK:PK_G-file-cocoon://samples/reproduceMultipleThreads/exists/58840.xml?pipelinehash=-4996088883111986478_T-xsltc-file:///Users/alex/dev/cocoon/cocoon-2.1.11/build/webapp/samples/reproduceMultipleThreads/productOfferIncludeDevice.xsl;noInc=_T-include-I_S-xml-1 (class: org.mortbay.util.ThreadPool$PoolThread)
          PIPELOCK:PK_G-file-file:///Users/alex/dev/cocoon/cocoon-2.1.11/build/webapp/samples/reproduceMultipleThreads/55933.xml (class: org.mortbay.util.ThreadPool$PoolThread)
          PIPELOCK:PK_G-file-file:///Users/alex/dev/cocoon/cocoon-2.1.11/build/webapp/samples/reproduceMultipleThreads/58840.xml (class: org.mortbay.util.ThreadPool$PoolThread)


          I added some logging to AbstractCachingProcessingPipeline.java which reconfirms the explanations above:
          INFO (2008-03-13) 13:50.16:072 [sitemap] (/samples/reproduceMultipleThreads/productOfferForDevice/55933/) PoolThread-47/AbstractCachingProcessingPipeline: generating lock PIPELOCK:PK_G-file-cocoon://samples/reproduceMultipleThreads/exists/55933.xml?pipelinehash=-910770960103935149_T-xsltc-file:///Users/alex/dev/cocoon/cocoon-2.1.11/build/webapp/samples/reproduceMultipleThreads/productOfferIncludeDevice.xsl;noInc=_T-include-I_S-xml-1
          INFO (2008-03-13) 13:50.16:074 [sitemap] (/samples/reproduceMultipleThreads/productOfferForDevice/55933/) PoolThread-47/AbstractCachingProcessingPipeline: generating lock PIPELOCK:PK_G-file-file:///Users/alex/dev/cocoon/cocoon-2.1.11/build/webapp/samples/reproduceMultipleThreads/55933.xml
          INFO (2008-03-13) 13:50.16:075 [sitemap] (/samples/reproduceMultipleThreads/productOfferForDevice/58840/) PoolThread-6/AbstractCachingProcessingPipeline: generating lock PIPELOCK:PK_G-file-cocoon://samples/reproduceMultipleThreads/exists/58840.xml?pipelinehash=-4996088883111986478_T-xsltc-file:///Users/alex/dev/cocoon/cocoon-2.1.11/build/webapp/samples/reproduceMultipleThreads/productOfferIncludeDevice.xsl;noInc=_T-include-I_S-xml-1
          INFO (2008-03-13) 13:50.16:075 [sitemap] (/samples/reproduceMultipleThreads/productOfferForDevice/58840/) PoolThread-6/AbstractCachingProcessingPipeline: generating lock PIPELOCK:PK_G-file-file:///Users/alex/dev/cocoon/cocoon-2.1.11/build/webapp/samples/reproduceMultipleThreads/58840.xml
          INFO (2008-03-13) 13:50.16:281 [sitemap] (/samples/reproduceMultipleThreads/productOfferForDevice/58840/) PoolThread-6/AbstractCachingProcessingPipeline: waiting for lock PIPELOCK:PK_G-file-file:///Users/alex/dev/cocoon/cocoon-2.1.11/build/webapp/samples/reproduceMultipleThreads/55933.xml
          INFO (2008-03-13) 13:50.16:304 [sitemap] (/samples/reproduceMultipleThreads/productOfferForDevice/55933/) PoolThread-47/AbstractCachingProcessingPipeline: waiting for lock PIPELOCK:PK_G-file-file:///Users/alex/dev/cocoon/cocoon-2.1.11/build/webapp/samples/reproduceMultipleThreads/58840.xml


          With the attached reproduceMultipleThreads.tar.gz you can reproduce the behaviour yourself:
          * download and extract Cocoon 2.1.11
          * cd $CocoonHome
          * ./build.sh
          * cd build/webapp/samples
          * tar -xzf $DownloadFolder/reproduceMultipleThreads.tar.gz
          * cd ../../..
          * ./cocoon.sh
          * open 3 terminals and cd into $CocoonHome/build/webapp/samples/reproduceMultipleThreads in each

          * dry run without invalidating the cache to see that everything is working:
            - terminal 1: ./terminal1.sh
            - terminal 2: ./terminal2.sh

          * run with invalidating the cache every seconds:
            - terminal 1: ./terminal1.sh
            - terminal 2: ./terminal2.sh
            - terminal 3: ./terminal3.sh
          * When Apache Bench has run into a timeout you can view the pipelocks with http://localhost:8888/samples/reproduceMultipleThreads/pipelocks


          We are currently facing this issue on our production servers.

          WHAT IS THE BEST WAY TO FIX THAT?
          * removing the pipeline locking code as Ellis suggested?
          * making waitForLock fuzzy
          * ...

          If the pipelock design is the same in Cocoon 2.2 the same deadlock could occur there.
          Show
          Alexander Daniel added a comment - Two requests can deadlock each other in Cocoon 2.1.11 (without use of parallel with include transformer): * request A: generating lock for 55933 * request B: generating lock for 58840 * request B: waiting for lock 55933 which is hold by request A * request A: waiting for lock 58840 which is hold by request B I can reproduce this behaviour with Apache Bench and following pipeline: * terminal 1: Apache Bench request A (ab -k -n 10000 -c 25 http://localhost:8888/samples/reproduceMultipleThreads/productOfferForDevice/55933/) * terminal 2: Apache Bench request B (ab -k -n 10000 -c 25 http://localhost:8888/samples/reproduceMultipleThreads/productOfferForDevice/58840/) * terminal 3: touching the two data files every second to invalidate the cache (while true; do echo -n "."; touch 55933.xml 58840.xml; sleep 1; done) * pipeline: <map:pipeline type="caching">   <map:match pattern="productOfferForDevice*/*/">     <map:generate src="cocoon:/exists/{2}.xml" label="a"/>     <map:transform type="xsltc" src="productOfferIncludeDevice.xsl" label="b">         <map:parameter name="noInc" value="{1}"/>     </map:transform>     <map:transform type="include" label="c"/>     <map:serialize type="xml"/> </map:match>   <map:match pattern="exists/**">     <map:act type="resource-exists">         <map:parameter name="url" value="{1}" />         <map:generate src="{../1}" />         <map:serialize type="xml" />     </map:act>     <!-- not found -->     <map:generate src="dummy.xml" />     <map:serialize type="xml" />   </map:match> </map:pipeline> After some seconds the deadlock occurs ==> * Apache Bench requests run into a timeout * I can see following pipe locks in the default transient store: PIPELOCK:PK_ G-file-cocoon://samples/reproduceMultipleThreads/exists/55933.xml?pipelinehash=-910770960103935149_T-xsltc-file:///Users/alex/dev/cocoon/cocoon-2.1.11/build/webapp/samples/reproduceMultipleThreads/productOfferIncludeDevice.xsl;noInc=_T-include-I_S-xml-1 (class: org.mortbay.util.ThreadPool$PoolThread) PIPELOCK:PK_ G-file-cocoon://samples/reproduceMultipleThreads/exists/58840.xml?pipelinehash=-4996088883111986478_T-xsltc-file:///Users/alex/dev/cocoon/cocoon-2.1.11/build/webapp/samples/reproduceMultipleThreads/productOfferIncludeDevice.xsl;noInc=_T-include-I_S-xml-1 (class: org.mortbay.util.ThreadPool$PoolThread) PIPELOCK:PK_ G-file-file:///Users/alex/dev/cocoon/cocoon-2.1.11/build/webapp/samples/reproduceMultipleThreads/55933.xml (class: org.mortbay.util.ThreadPool$PoolThread) PIPELOCK:PK_ G-file-file:///Users/alex/dev/cocoon/cocoon-2.1.11/build/webapp/samples/reproduceMultipleThreads/58840.xml (class: org.mortbay.util.ThreadPool$PoolThread) I added some logging to AbstractCachingProcessingPipeline.java which reconfirms the explanations above: INFO (2008-03-13) 13:50.16:072 [sitemap] (/samples/reproduceMultipleThreads/productOfferForDevice/55933/) PoolThread-47/AbstractCachingProcessingPipeline: generating lock PIPELOCK:PK_ G-file-cocoon://samples/reproduceMultipleThreads/exists/55933.xml?pipelinehash=-910770960103935149_T-xsltc-file:///Users/alex/dev/cocoon/cocoon-2.1.11/build/webapp/samples/reproduceMultipleThreads/productOfferIncludeDevice.xsl;noInc=_T-include-I_S-xml-1 INFO (2008-03-13) 13:50.16:074 [sitemap] (/samples/reproduceMultipleThreads/productOfferForDevice/55933/) PoolThread-47/AbstractCachingProcessingPipeline: generating lock PIPELOCK:PK_ G-file-file:///Users/alex/dev/cocoon/cocoon-2.1.11/build/webapp/samples/reproduceMultipleThreads/55933.xml INFO (2008-03-13) 13:50.16:075 [sitemap] (/samples/reproduceMultipleThreads/productOfferForDevice/58840/) PoolThread-6/AbstractCachingProcessingPipeline: generating lock PIPELOCK:PK_ G-file-cocoon://samples/reproduceMultipleThreads/exists/58840.xml?pipelinehash=-4996088883111986478_T-xsltc-file:///Users/alex/dev/cocoon/cocoon-2.1.11/build/webapp/samples/reproduceMultipleThreads/productOfferIncludeDevice.xsl;noInc=_T-include-I_S-xml-1 INFO (2008-03-13) 13:50.16:075 [sitemap] (/samples/reproduceMultipleThreads/productOfferForDevice/58840/) PoolThread-6/AbstractCachingProcessingPipeline: generating lock PIPELOCK:PK_ G-file-file:///Users/alex/dev/cocoon/cocoon-2.1.11/build/webapp/samples/reproduceMultipleThreads/58840.xml INFO (2008-03-13) 13:50.16:281 [sitemap] (/samples/reproduceMultipleThreads/productOfferForDevice/58840/) PoolThread-6/AbstractCachingProcessingPipeline: waiting for lock PIPELOCK:PK_ G-file-file:///Users/alex/dev/cocoon/cocoon-2.1.11/build/webapp/samples/reproduceMultipleThreads/55933.xml INFO (2008-03-13) 13:50.16:304 [sitemap] (/samples/reproduceMultipleThreads/productOfferForDevice/55933/) PoolThread-47/AbstractCachingProcessingPipeline: waiting for lock PIPELOCK:PK_ G-file-file:///Users/alex/dev/cocoon/cocoon-2.1.11/build/webapp/samples/reproduceMultipleThreads/58840.xml With the attached reproduceMultipleThreads.tar.gz you can reproduce the behaviour yourself: * download and extract Cocoon 2.1.11 * cd $CocoonHome * ./build.sh * cd build/webapp/samples * tar -xzf $DownloadFolder/reproduceMultipleThreads.tar.gz * cd ../../.. * ./cocoon.sh * open 3 terminals and cd into $CocoonHome/build/webapp/samples/reproduceMultipleThreads in each * dry run without invalidating the cache to see that everything is working:   - terminal 1: ./terminal1.sh   - terminal 2: ./terminal2.sh * run with invalidating the cache every seconds:   - terminal 1: ./terminal1.sh   - terminal 2: ./terminal2.sh   - terminal 3: ./terminal3.sh * When Apache Bench has run into a timeout you can view the pipelocks with http://localhost:8888/samples/reproduceMultipleThreads/pipelocks We are currently facing this issue on our production servers. WHAT IS THE BEST WAY TO FIX THAT? * removing the pipeline locking code as Ellis suggested? * making waitForLock fuzzy * ... If the pipelock design is the same in Cocoon 2.2 the same deadlock could occur there.
          Hide
          Alexander Daniel added a comment -
          Sample code to reproduce deadlock in Cocoon 2.1.11 with two requests
          Show
          Alexander Daniel added a comment - Sample code to reproduce deadlock in Cocoon 2.1.11 with two requests
          Hide
          Vadim Gritsenko added a comment -
          Alexander - if you take a look at the top of this page you will notice that issue has been resolved in trunk (see: Fix version (Component): Cocoon Core - 2.2.0-RC3-dev). If you take a look at the trunk you can also see a test case which was reproducing that issue.

          The fix is, IIRC, is to synchronize on top level http request. I planned to fix it on branch too but so far had no time to do it.
          Show
          Vadim Gritsenko added a comment - Alexander - if you take a look at the top of this page you will notice that issue has been resolved in trunk (see: Fix version (Component): Cocoon Core - 2.2.0-RC3-dev). If you take a look at the trunk you can also see a test case which was reproducing that issue. The fix is, IIRC, is to synchronize on top level http request. I planned to fix it on branch too but so far had no time to do it.
          Hide
          Alexander Daniel added a comment -
          Thanks for the hint Vadim. I will have a look into the trunk.

          In Cocoon 2.1.9 a deadlock could be reproduced with one single request running in one thread. This has been fixed in Cocoon 2.1.11. The issue I described uses two top level http requests which deadlock each other because they depend on the same resources which they acquire in a different order. If Cocoon 2.2 trunk synchronizes on top level http requests the same issue could occur from my understanding.
          Show
          Alexander Daniel added a comment - Thanks for the hint Vadim. I will have a look into the trunk. In Cocoon 2.1.9 a deadlock could be reproduced with one single request running in one thread. This has been fixed in Cocoon 2.1.11. The issue I described uses two top level http requests which deadlock each other because they depend on the same resources which they acquire in a different order. If Cocoon 2.2 trunk synchronizes on top level http requests the same issue could occur from my understanding.
          Hide
          Vadim Gritsenko added a comment -
          Interesting. It sounds that you are describing related, but still different issue. If that's the case then it can be present in trunk too. It would be appropriate to file a separate Jira issue for this.
          Show
          Vadim Gritsenko added a comment - Interesting. It sounds that you are describing related, but still different issue. If that's the case then it can be present in trunk too. It would be appropriate to file a separate Jira issue for this.
          Hide
          Alexander Daniel added a comment -
          Show
          Alexander Daniel added a comment - I filed a separate Jira issue: https://issues.apache.org/jira/browse/COCOON-2173
          Hide
          Vadim Gritsenko added a comment -
          Fix backported to 2.1 branch.
          Show
          Vadim Gritsenko added a comment - Fix backported to 2.1 branch.
          Hide
          Vadim Gritsenko added a comment -
          Both trunk and branch are patched for 'BIG SCARY UN-SYNCHRONIZED GAP' and pipeline locking is made optional (use 'locking' parameter), and with non-infinite wait time (use 'locking-timeout' parameter) defaulting to 7 seconds. Please test and if there are no more issues with it, I'll close this bug report.
          Show
          Vadim Gritsenko added a comment - Both trunk and branch are patched for 'BIG SCARY UN-SYNCHRONIZED GAP' and pipeline locking is made optional (use 'locking' parameter), and with non-infinite wait time (use 'locking-timeout' parameter) defaulting to 7 seconds. Please test and if there are no more issues with it, I'll close this bug report.
          Hide
          Ellis Pritchard added a comment -
          Thanks for the "BIG SCARY UNSYNCHRONIZED GAP" patch.

          I've confirmed that the problem (pre-patched) has been occurring on our live servers, resulting in hanging request processor threads, and ultimately server death.

          I think the patch fixes the race-condition, however the wait timeout should definitely save us!!
          Show
          Ellis Pritchard added a comment - Thanks for the "BIG SCARY UNSYNCHRONIZED GAP" patch. I've confirmed that the problem (pre-patched) has been occurring on our live servers, resulting in hanging request processor threads, and ultimately server death. I think the patch fixes the race-condition, however the wait timeout should definitely save us!!
          Hide
          Andrew Savory added a comment -
          Closed after discussion with Ellis.
          Show
          Andrew Savory added a comment - Closed after discussion with Ellis.

            People

            • Assignee:
              Unassigned
              Reporter:
              Ellis Pritchard
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development