ODE
  1. ODE
  2. ODE-527

Failure recovery doesn't work while no serviceendpoint is registered

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 1.3.2
    • Fix Version/s: 1.3.4
    • Component/s: JBI Integration
    • Labels:
      None
    • Environment:
      Servicemix 3.3

      Description

      Given a process, which INVOKEs some unregistered service endpoint, I had an exception, which was occurring endlessly not regarding any activityRecovery settings, such as faultOnFailure=true.

      12:10:01,683 | ERROR | pool-4-thread-1 | SimpleScheduler | duler.simple.SimpleScheduler$4 410 | Error while executing transaction
      org.apache.ode.bpel.iapi.Scheduler$JobProcessorException: java.lang.RuntimeException: org.apache.ode.bpel.iapi.ContextException: Unknown endpoint:

      {ABC}Abc:default
      at org.apache.ode.bpel.engine.BpelEngineImpl.onScheduledJob(BpelEngineImpl.java:425)
      at org.apache.ode.bpel.engine.BpelServerImpl.onScheduledJob(BpelServerImpl.java:377)
      at org.apache.ode.scheduler.simple.SimpleScheduler$4$1.call(SimpleScheduler.java:386)
      at org.apache.ode.scheduler.simple.SimpleScheduler$4$1.call(SimpleScheduler.java:380)
      at org.apache.ode.scheduler.simple.SimpleScheduler.execTransaction(SimpleScheduler.java:208)
      at org.apache.ode.scheduler.simple.SimpleScheduler$4.call(SimpleScheduler.java:379)
      at org.apache.ode.scheduler.simple.SimpleScheduler$4.call(SimpleScheduler.java:376)
      at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:269)
      at java.util.concurrent.FutureTask.run(FutureTask.java:123)
      at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:650)
      at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:675)
      at java.lang.Thread.run(Thread.java:595)
      Caused by: java.lang.RuntimeException: org.apache.ode.bpel.iapi.ContextException: Unknown endpoint: {ABC}

      ABC:default
      at org.apache.ode.jacob.vpu.JacobVPU$JacobThreadImpl.run(JacobVPU.java:464)
      at org.apache.ode.jacob.vpu.JacobVPU.execute(JacobVPU.java:139)
      at org.apache.ode.bpel.engine.BpelRuntimeContextImpl.execute(BpelRuntimeContextImpl.java:840)
      at org.apache.ode.bpel.engine.PartnerLinkMyRoleImpl.invokeNewInstance(PartnerLinkMyRoleImpl.java:206)
      at org.apache.ode.bpel.engine.BpelProcess.invokeProcess(BpelProcess.java:211)
      at org.apache.ode.bpel.engine.BpelProcess.handleWorkEvent(BpelProcess.java:384)
      at org.apache.ode.bpel.engine.BpelEngineImpl.onScheduledJob(BpelEngineImpl.java:415)
      ... 11 more
      Caused by: org.apache.ode.bpel.iapi.ContextException: Unknown endpoint:

      {ABC}

      ABC:default
      at org.apache.ode.jbi.JbiEndpointReference.getServiceEndpoint(JbiEndpointReference.java:99)
      at org.apache.ode.jbi.JbiEndpointReference.toXML(JbiEndpointReference.java:64)
      at org.apache.ode.bpel.engine.BpelRuntimeContextImpl.invoke(BpelRuntimeContextImpl.java:759)
      at org.apache.ode.bpel.runtime.INVOKE.run(INVOKE.java:100)
      at sun.reflect.GeneratedMethodAccessor96.invoke(Unknown Source)
      at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
      at java.lang.reflect.Method.invoke(Method.java:585)
      at org.apache.ode.jacob.vpu.JacobVPU$JacobThreadImpl.run(JacobVPU.java:451)
      ... 17 more

        Activity

        Hide
        Rafal Rusin added a comment -

        I did a patch, which solves this problem.

        Show
        Rafal Rusin added a comment - I did a patch, which solves this problem.
        Hide
        Alex Boisvert added a comment -

        I don't think the patch is right. Shouldn't the IL return a failure back to the engine so that it can decide what to do on failure? Masking the exception seems like a bad idea.

        Show
        Alex Boisvert added a comment - I don't think the patch is right. Shouldn't the IL return a failure back to the engine so that it can decide what to do on failure? Masking the exception seems like a bad idea.
        Hide
        Rafal Rusin added a comment -

        Sorry for not beeing clear enough.

        Masking exception is done just for not causing JobProcessorException during building endpoint reference description in engine. It's used only to build proper mex to store in DB, which is later processed in IL. The above stacktrace refers to JbiEndpointReference.getServiceEndpoint, but this is still in engine's job - bpel evaluation.
        After applying this patch, later processing in engine does create a mex with an empty service description. The mex is then processed in IL (it's another job) and this leads to mex.replyWithFailure, because an empty serviceref destination doesn't exist. The IL (jbi) replyWithFailure is implemented correctly Then engine receives this failure (in yet another job) and processes in a way it's specified in activityRecovery elements in bpel.

        After applying this patch and running a bpel containing INVOKE to a non existing jbi endpoint with default activityRecovery settings, we get failure's count = 1 in management API and an ERROR in logs. This is what's expected. So I think masking the exception in this case is correct.

        Show
        Rafal Rusin added a comment - Sorry for not beeing clear enough. Masking exception is done just for not causing JobProcessorException during building endpoint reference description in engine. It's used only to build proper mex to store in DB, which is later processed in IL. The above stacktrace refers to JbiEndpointReference.getServiceEndpoint, but this is still in engine's job - bpel evaluation. After applying this patch, later processing in engine does create a mex with an empty service description. The mex is then processed in IL (it's another job) and this leads to mex.replyWithFailure, because an empty serviceref destination doesn't exist. The IL (jbi) replyWithFailure is implemented correctly Then engine receives this failure (in yet another job) and processes in a way it's specified in activityRecovery elements in bpel. After applying this patch and running a bpel containing INVOKE to a non existing jbi endpoint with default activityRecovery settings, we get failure's count = 1 in management API and an ERROR in logs. This is what's expected. So I think masking the exception in this case is correct.
        Hide
        Alex Boisvert added a comment -

        Ok, I understand better the intent... however the current patch is not the most appropriate solution because it effectively loses the EPR information, so if the EPR is valid but not currently resolvable, we lose the information and the MEX becomes non-recoverable.

        I agree that we need a safe JbiEndpointReference.toXML() method... I just think that if there's a failure, the EPR should be serialized to XML such that there is no loss of information and such that it can be resolved at a later time should the endpoint become available again.

        I suggest using the standard JBI internal endpoint reference XML schema (section 5.5.4.1 of the JBI spec):

        <jbi:end-point-reference service-name="qname" end-point-name="text"/>

        By the way, I just spotted an issue with the current code:

        DocumentFragment fragment = getServiceEndpoint().getAsReference(_type);

        The argument to getAsReference() should be null, not _type. The method expects an operation name whereas _type is an EPR type.

        Show
        Alex Boisvert added a comment - Ok, I understand better the intent... however the current patch is not the most appropriate solution because it effectively loses the EPR information, so if the EPR is valid but not currently resolvable, we lose the information and the MEX becomes non-recoverable. I agree that we need a safe JbiEndpointReference.toXML() method... I just think that if there's a failure, the EPR should be serialized to XML such that there is no loss of information and such that it can be resolved at a later time should the endpoint become available again. I suggest using the standard JBI internal endpoint reference XML schema (section 5.5.4.1 of the JBI spec): <jbi:end-point-reference service-name="qname" end-point-name="text"/> By the way, I just spotted an issue with the current code: DocumentFragment fragment = getServiceEndpoint().getAsReference(_type); The argument to getAsReference() should be null, not _type. The method expects an operation name whereas _type is an EPR type.
        Hide
        Rafal Rusin added a comment -

        Right, that's a good point with later availability of endpoint. I'll correct this patch.

        Show
        Rafal Rusin added a comment - Right, that's a good point with later availability of endpoint. I'll correct this patch.
        Hide
        Rafal Rusin added a comment -

        I inspected current ODE1X code regarding failure recovery and it looks like serializing proper jbi:end-point-reference is not needed, because it's reevaluated on each failureRecovery retry (whole invoke activity is reexecuted).
        So this patch solves problem well.

        Show
        Rafal Rusin added a comment - I inspected current ODE1X code regarding failure recovery and it looks like serializing proper jbi:end-point-reference is not needed, because it's reevaluated on each failureRecovery retry (whole invoke activity is reexecuted). So this patch solves problem well.
        Hide
        Hudson added a comment -

        Integrated in ODE-1.x #46 (See http://hudson.zones.apache.org/hudson/job/ODE-1.x/46/)
        : Failure recovery doesn't work while no serviceendpoint is registered (jbi)

        Show
        Hudson added a comment - Integrated in ODE-1 .x #46 (See http://hudson.zones.apache.org/hudson/job/ODE-1.x/46/ ) : Failure recovery doesn't work while no serviceendpoint is registered (jbi)

          People

          • Assignee:
            Rafal Rusin
            Reporter:
            Rafal Rusin
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development