UIMA
  1. UIMA
  2. UIMA-2392

UIMA-AS CAS multiplier hangs fetching empty CAS after client timeouts

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 2.3.1AS
    • Fix Version/s: 2.4.0AS
    • Component/s: Async Scaleout
    • Labels:
    • Environment:

      RedHat Enterprise Linux 6.0, Mac OSX Lion 10.7.3

      Description

      I'm attaching a test case that reproduces the hang. See the README inside the zip about how to execute it.

      The annotator (ForwardJCas) consists of a simple CAS multiplier with delays in it scaled out in its own thread within 2 levels of aggregates. It receives a CAS, sleeps for 3 seconds, gets a new empty CAS, copies information into it, sleeps again, and then returns the CAS. The unit test (UimaAsTest) initializes 40 UIMA-AS clients with a timeout of 8 seconds, send 1 CAS from each to the service, and then waits. After 1-2 runs, the CAS multiplier winds up hanging on the getEmptyCas() call in the scaled out annotator.

      What appears to be happening is that free CAS messages are not being sent from the client to the service in all cases when the timeouts occur. A similar defect (https://issues.apache.org/jira/browse/UIMA-1786) was fixed in 2.3.1 with slightly different symptoms.

      1. uima-hang.zip
        22 kB
        Peter Parente
      2. logs.zip
        183 kB
        Peter Parente

        Activity

        Peter Parente created issue -
        Hide
        Peter Parente added a comment -

        Test case for hang.

        Show
        Peter Parente added a comment - Test case for hang.
        Peter Parente made changes -
        Field Original Value New Value
        Attachment uima-hang.zip [ 12525280 ]
        Hide
        Peter Parente added a comment -

        Logs. Clean = run with timeout set to 0 (infinite). Hang = 2 client runs with timeout set to 8000ms as in the test case.

        Show
        Peter Parente added a comment - Logs. Clean = run with timeout set to 0 (infinite). Hang = 2 client runs with timeout set to 8000ms as in the test case.
        Peter Parente made changes -
        Attachment logs.zip [ 12525318 ]
        Hide
        Peter Parente added a comment -

        Follow-up reading of the logs. Output CASes wind up in 3 situations:

        1. Service produces output CAS and sends it back to the client via the client's temporary response queue. Client receives the CAS. Client sends a free CAS message to the service. The service releases the CAS back into the pool.

        2. Service produces an output CAS and fails to send it back to the client because the client's temp response queue has closed (timeout, shutdown, etc). The service releases the CAS back into the pool.

        3. Service produces an output CAS and sends it back to the client via the client's temporary response queue. Client is no longer listening to the queue (timeout, crash, etc.) and so never receives it nor sends a free message. The service, which successfully put it on the response queue, doesn't know that it should be released. The CAS is lost in the ether.

        Hit #3 too many times and you hang when the CAS pool is exhausted.

        Show
        Peter Parente added a comment - Follow-up reading of the logs. Output CASes wind up in 3 situations: 1. Service produces output CAS and sends it back to the client via the client's temporary response queue. Client receives the CAS. Client sends a free CAS message to the service. The service releases the CAS back into the pool. 2. Service produces an output CAS and fails to send it back to the client because the client's temp response queue has closed (timeout, shutdown, etc). The service releases the CAS back into the pool. 3. Service produces an output CAS and sends it back to the client via the client's temporary response queue. Client is no longer listening to the queue (timeout, crash, etc.) and so never receives it nor sends a free message. The service, which successfully put it on the response queue, doesn't know that it should be released. The CAS is lost in the ether. Hit #3 too many times and you hang when the CAS pool is exhausted.
        Hide
        Jerry Cwiklik added a comment -

        Peter, #3 may be related to the fact that you are using UIMA AS Client instance per CAS and possible problem with the client not closing Broker connection when its stop() method is called. This leads to temp queue leak, but also to a behavior that you are seeing. Had the client actually disconnected from the broker, the UIMA AS service would have failed sending the child CAS (temp queue would be gone) and would release it back to the CM pool. Can you try to run with a single instance of UIMA AS client as an interim fix?

        Show
        Jerry Cwiklik added a comment - Peter, #3 may be related to the fact that you are using UIMA AS Client instance per CAS and possible problem with the client not closing Broker connection when its stop() method is called. This leads to temp queue leak, but also to a behavior that you are seeing. Had the client actually disconnected from the broker, the UIMA AS service would have failed sending the child CAS (temp queue would be gone) and would release it back to the CM pool. Can you try to run with a single instance of UIMA AS client as an interim fix?
        Hide
        Jerry Cwiklik added a comment -

        In this scenario there are multiple uima-as client instances (same jvm - each client in a different thread) sending CASes and stopping due to simulated timeouts. All of them share the same Connection object (by design) but have different(dedicated) JMS Sessions, Consumers and temp reply queues. The Connection is shared by all clients to optimize the broker. This reduces number of threads, context switching, etc.
        When any of the clients is stopped, its temp reply queue is not being removed from the Broker because the Connection is still open. The uima-as closes the Connection when all clients are stopped. This leads to two problems: temp queue build-up/leak and possible CM hang as described in this JIRA. The hang is caused by the fact that the service sends a CAS to a temp reply queue with no listener (client was stopped).

        Modified uima-as client to delete its temp reply queue while cleaning up during stop. The AMQ connection class has an API - deleteTempDestination() which enables temp queue removal. Now, when a client is stopped, the service is unable to deliver a CAS to the client's temp queue (it no longer exists) and releases it back to the CM's cas pool.

        Tested the change numerous times with supplied test case with no hang

        Show
        Jerry Cwiklik added a comment - In this scenario there are multiple uima-as client instances (same jvm - each client in a different thread) sending CASes and stopping due to simulated timeouts. All of them share the same Connection object (by design) but have different(dedicated) JMS Sessions, Consumers and temp reply queues. The Connection is shared by all clients to optimize the broker. This reduces number of threads, context switching, etc. When any of the clients is stopped, its temp reply queue is not being removed from the Broker because the Connection is still open. The uima-as closes the Connection when all clients are stopped. This leads to two problems: temp queue build-up/leak and possible CM hang as described in this JIRA. The hang is caused by the fact that the service sends a CAS to a temp reply queue with no listener (client was stopped). Modified uima-as client to delete its temp reply queue while cleaning up during stop. The AMQ connection class has an API - deleteTempDestination() which enables temp queue removal. Now, when a client is stopped, the service is unable to deliver a CAS to the client's temp queue (it no longer exists) and releases it back to the CM's cas pool. Tested the change numerous times with supplied test case with no hang
        Jerry Cwiklik made changes -
        Status Open [ 1 ] Closed [ 6 ]
        Assignee Jerry Cwiklik [ cwiklik ]
        Fix Version/s 2.4.0AS [ 12316312 ]
        Resolution Fixed [ 1 ]
        Hide
        Gino Bustelo added a comment -

        Jerry, I wanted to catch your attention on https://issues.apache.org/jira/browse/UIMA-2401. Different issue, but still a defect in the multiple uima-as client instances (same jvm - each client in a different thread) setup.

        Show
        Gino Bustelo added a comment - Jerry, I wanted to catch your attention on https://issues.apache.org/jira/browse/UIMA-2401 . Different issue, but still a defect in the multiple uima-as client instances (same jvm - each client in a different thread) setup.

          People

          • Assignee:
            Jerry Cwiklik
            Reporter:
            Peter Parente
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development