Karaf
  1. Karaf
  2. KARAF-327

Graceful shutdown of Windows service, revisited

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 2.1.2
    • Fix Version/s: 2.1.3, 2.2.0
    • Component/s: karaf-os-integration
    • Labels:
      None
    • Environment:

      Windows XP SP 3

      Description

      I'm trying to make my Karaf service more resilient to shutdown. In particular I want my Camel routes to shutdown gracefully. I started this discussion a few months ago, see http://www.mail-archive.com/user@karaf.apache.org/msg00084.html.

      Guillaume created the JIRA ticket KARAF-176 which is also resolved. I haven't had time to test this until now.

      It seems to me that the problem persists. I'm now using Karaf 2.1.2 and Camel 2.5.0. When I run Karaf from the command line and terminates Karaf by pressing <ctrl-D>, Camel will gracefully shutdown. It looks something like this:

      2010-12-09 13:04:51,737 | INFO | FelixStartLevel | DefaultShutdownStrategy | mel.impl.DefaultShutdownStrategy 114 | Starting to graceful shutdown 1 routes (timeout 300 seconds)
      2010-12-09 13:04:51,737 | INFO | 1 - ShutdownTask | DefaultShutdownStrategy | ultShutdownStrategy$ShutdownTask 383 | Route: route1 suspended and shutdown deferred, was consuming from: Endpointfile://C:\dev\Karaf\connect\common/data/interfaces/sample/file2file?delay=10000&include=%28%3Fi%29.*%28%3F%3C%21%5C.TMP%29&move=archive%2F%24%7Bdate%3Anow%3AyyyyMMdd%7D%2F%24%7Bfile%3Aonlyname%7D&moveFailed=failed%2F%24%7Bfile%3Aonlyname.noext%7D-%24%7Bdate%3Anow%3AyyyyMMddHHmmssSSS%7D.%24%7Bfile%3Aext%7D
      2010-12-09 13:04:51,737 | INFO | 1 - ShutdownTask | DefaultShutdownStrategy | ultShutdownStrategy$ShutdownTask 422 | Waiting as there are still 1 inflight and pending exchanges to complete, timeout in 300 seconds.
      2010-12-09 13:04:52,737 | INFO | 1 - ShutdownTask | DefaultShutdownStrategy | ultShutdownStrategy$ShutdownTask 422 | Waiting as there are still 1 inflight and pending exchanges to complete, timeout in 299 seconds.
      2010-12-09 13:04:53,737 | INFO | 1 - ShutdownTask | DefaultShutdownStrategy | ultShutdownStrategy$ShutdownTask 422 | Waiting as there are still 1 inflight and pending exchanges to complete, timeout in 298 seconds.
      ...
      2010-12-09 13:06:54,782 | INFO | 1 - ShutdownTask | DefaultShutdownStrategy | ultShutdownStrategy$ShutdownTask 422 | Waiting as there are still 1 inflight and pending exchanges to complete, timeout in 177 seconds.
      2010-12-09 13:06:55,782 | INFO | 1 - ShutdownTask | DefaultShutdownStrategy | ultShutdownStrategy$ShutdownTask 442 | Route: route1 shutdown complete.
      2010-12-09 13:06:55,782 | INFO | FelixStartLevel | DefaultShutdownStrategy | mel.impl.DefaultShutdownStrategy 146 | Graceful shutdown of 1 routes completed in 124 seconds
      2010-12-09 13:06:55,798 | INFO | FelixStartLevel | DefaultInflightRepository | l.impl.DefaultInflightRepository 93 | Shutting down with no inflight exchanges.
      2010-12-09 13:06:55,798 | INFO | FelixStartLevel | DefaultCamelContext | e.camel.impl.DefaultCamelContext 1374 | Uptime: 2 minutes
      2010-12-09 13:06:55,798 | INFO | FelixStartLevel | DefaultCamelContext | e.camel.impl.DefaultCamelContext 1375 | Apache Camel 2.5.0 (CamelContext: Sample file transfer from file to file) is shutdown in 2 minutes
      ...

      But when Karaf is running as a service (I run on Windows XP SP 3) and stop the service from the control panel. Karaf, and therefore Camel, is abruptly killed. It looks like this:

      2010-12-09 12:36:11,103 | INFO | FelixStartLevel | DefaultShutdownStrategy | mel.impl.DefaultShutdownStrategy 114 | Starting to graceful shutdown 1 routes (timeout 300 seconds)
      2010-12-09 12:36:11,103 | INFO | 4 - ShutdownTask | DefaultShutdownStrategy | ultShutdownStrategy$ShutdownTask 383 | Route: route2 suspended and shutdown deferred, was consuming from: Endpointfile://C:\dev\Karaf\connect\common/data/interfaces/sample/file2file?delay=10000&include=%28%3Fi%29.*%28%3F%3C%21%5C.TMP%29&move=archive%2F%24%7Bdate%3Anow%3AyyyyMMdd%7D%2F%24%7Bfile%3Aonlyname%7D&moveFailed=failed%2F%24%7Bfile%3Aonlyname.noext%7D-%24%7Bdate%3Anow%3AyyyyMMddHHmmssSSS%7D.%24%7Bfile%3Aext%7D
      2010-12-09 12:36:11,103 | INFO | 4 - ShutdownTask | DefaultShutdownStrategy | ultShutdownStrategy$ShutdownTask 422 | Waiting as there are still 1 inflight and pending exchanges to complete, timeout in 300 seconds.

      Nothing more is logged since the process is killed. I know that there is never a guarantee that Karaf will shutdown gracefully since the process might just die (e g out of power). But it would be nice if the control panel could be used for stopping the service since thats what most operations engineers do. There is always the option to log in to Karaf via SSH and issue the shutdown command before stopping the service but then it becomes too complicated for most people.

      1. camel-route.xml
        0.8 kB
        Achim Nierbeck

        Issue Links

          Activity

          Hide
          Achim Nierbeck added a comment -

          Just a possible solution I had in my mind:

          How about calling the main.destroy method with true and a optional parameter for a waiting timeout. Now if we let the "Wrapper"-Main register a callback method in the Main method we could call the WrapperManager.signalStopping with the same timeout.
          I will try to fix it like this.

          Show
          Achim Nierbeck added a comment - Just a possible solution I had in my mind: How about calling the main.destroy method with true and a optional parameter for a waiting timeout. Now if we let the "Wrapper"-Main register a callback method in the Main method we could call the WrapperManager.signalStopping with the same timeout. I will try to fix it like this.
          Hide
          Guillaume Nodet added a comment -

          This looks like a good way to go. I don't recall exactly how JSW works, but not waiting until the framework cleanly shuts down is certainly one cause of the problem.

          Show
          Guillaume Nodet added a comment - This looks like a good way to go. I don't recall exactly how JSW works, but not waiting until the framework cleanly shuts down is certainly one cause of the problem.
          Hide
          Achim Nierbeck added a comment -

          Fixed with revision 1050517

          Show
          Achim Nierbeck added a comment - Fixed with revision 1050517
          Hide
          Andreas Pieber added a comment -

          should be backported to karaf-2.1.x branch

          Show
          Andreas Pieber added a comment - should be backported to karaf-2.1.x branch
          Hide
          Andreas Pieber added a comment -

          Already fixed for trunk, should be backported to karaf-2.1.x branch

          Show
          Andreas Pieber added a comment - Already fixed for trunk, should be backported to karaf-2.1.x branch
          Hide
          Bengt Rodehav added a comment -

          Perfect. Will test as soon as possible. Do I have to configure anything (like timeouts) or do I just install Karaf as usual?

          Show
          Bengt Rodehav added a comment - Perfect. Will test as soon as possible. Do I have to configure anything (like timeouts) or do I just install Karaf as usual?
          Hide
          Achim Nierbeck added a comment -

          Fixed for 2.1.3 on revision 1050938

          --------------------------------------------------

          Regarding the testing, no special configuration is needed right now.

          Show
          Achim Nierbeck added a comment - Fixed for 2.1.3 on revision 1050938 -------------------------------------------------- Regarding the testing, no special configuration is needed right now.
          Hide
          Bengt Rodehav added a comment -

          Achim,

          Sorry for taking so long to test this. Just got back from christmas vacation and finally tried it out.

          However, I still get the exact same behaviour as before. I install Karaf as a service, I make sure that a Camel route has an inflight exchange and then I stop the service. I then expect the service to keep running until Camel gives up (just like the behaviour when running Karaf from the command line) but the service is stopped instantly.

          What am I doing wrong?

          Show
          Bengt Rodehav added a comment - Achim, Sorry for taking so long to test this. Just got back from christmas vacation and finally tried it out. However, I still get the exact same behaviour as before. I install Karaf as a service, I make sure that a Camel route has an inflight exchange and then I stop the service. I then expect the service to keep running until Camel gives up (just like the behaviour when running Karaf from the command line) but the service is stopped instantly. What am I doing wrong?
          Hide
          Bengt Rodehav added a comment -

          I browsed through the source code and ended up in the class org.apache.karaf.shell.wrapper.Main. Seems like that's where all the action takes place ...

          If I correctly understand the code then the process is given 1,5 seconds to finish up before it is killed. Since the default Camel behaviour is to keep the process alive up to 300 s if there are any inflight exchanges that explains my problems. I guess I will have to do any of the following:

          a) Make sure that Camel's inflight timeout is less than Karaf's timeout. Would it be possible to make Karaf's exit timeout (currently hardcoded to 1 s) configurable?
          b) Somehow configure Camel to shut down the routes as fast as possible if the service is about to stop.

          What I don't quite understand is why there is a difference in behaviour when running Karaf as a service compared to running Karaf from the command line. Intuitively I think that the behaviour should be the same. In the command line case, Karaf waits for Camel to finish (even up to 300 s) while in the service case there seems to be no connection between Camel's timeout (300 s) and Karaf's stop behaviour. Karaf terminates after 1,5 seconds regardless of what Camel does.

          Do you have any ideas of how this should be configured? Ideally I would like Karaf to wait for Camel no matter how Karaf was started.

          Show
          Bengt Rodehav added a comment - I browsed through the source code and ended up in the class org.apache.karaf.shell.wrapper.Main. Seems like that's where all the action takes place ... If I correctly understand the code then the process is given 1,5 seconds to finish up before it is killed. Since the default Camel behaviour is to keep the process alive up to 300 s if there are any inflight exchanges that explains my problems. I guess I will have to do any of the following: a) Make sure that Camel's inflight timeout is less than Karaf's timeout. Would it be possible to make Karaf's exit timeout (currently hardcoded to 1 s) configurable? b) Somehow configure Camel to shut down the routes as fast as possible if the service is about to stop. What I don't quite understand is why there is a difference in behaviour when running Karaf as a service compared to running Karaf from the command line. Intuitively I think that the behaviour should be the same. In the command line case, Karaf waits for Camel to finish (even up to 300 s) while in the service case there seems to be no connection between Camel's timeout (300 s) and Karaf's stop behaviour. Karaf terminates after 1,5 seconds regardless of what Camel does. Do you have any ideas of how this should be configured? Ideally I would like Karaf to wait for Camel no matter how Karaf was started.
          Hide
          Achim Nierbeck added a comment -

          I reopen it as I guess this didn't do the trick yet

          could you please provide a Sample with camel so I can test this?
          You are right, the action takes place in the class org.apache.karaf.shell.wrapper.Main.
          My presumption was that I tell the framework to shut down and wait for it, now I would
          have expected that the camel route can be safely be shut down.
          The actual 1.5 seconds should be the waiting time added to the wrapper during looping.

          Show
          Achim Nierbeck added a comment - I reopen it as I guess this didn't do the trick yet could you please provide a Sample with camel so I can test this? You are right, the action takes place in the class org.apache.karaf.shell.wrapper.Main. My presumption was that I tell the framework to shut down and wait for it, now I would have expected that the camel route can be safely be shut down. The actual 1.5 seconds should be the waiting time added to the wrapper during looping.
          Hide
          Bengt Rodehav added a comment -

          It's not easy for me to provide a complete Camel sample for you since I have a pretty complex setup around the starting of camel routes.

          Basically I create a simple file copy route as follows:

          from("file:in").to("file:out");

          When I drop a file in the "in" directory, it is copied to the "out" directory. In order to force Camel to have an inflight exchange I configure the redelivery parameters and make sure that the "to" uri is invalid (thus forcing the route to attempt redelivery until it finally gives up). As follows:

          onException(Exception.class).maximumRedeliveries(10).delayPattern("0:2000;5:10000;10:60000;20:600000;25:1800000");
          from("file:in").to("file:g:/out");

          Since I have no "g:" on my computer, Camel cannot copy the file but it will retry for a period of time while I stop the Karaf service (or shutdown Karaf with Ctrl-D from the command line).

          There are probably easier ways to force Camel to have an inflight exchange at the point of shutdown but this is the way I've been testing it.

          Show
          Bengt Rodehav added a comment - It's not easy for me to provide a complete Camel sample for you since I have a pretty complex setup around the starting of camel routes. Basically I create a simple file copy route as follows: from("file:in").to("file:out"); When I drop a file in the "in" directory, it is copied to the "out" directory. In order to force Camel to have an inflight exchange I configure the redelivery parameters and make sure that the "to" uri is invalid (thus forcing the route to attempt redelivery until it finally gives up). As follows: onException(Exception.class).maximumRedeliveries(10).delayPattern("0:2000;5:10000;10:60000;20:600000;25:1800000"); from("file:in").to("file:g:/out"); Since I have no "g:" on my computer, Camel cannot copy the file but it will retry for a period of time while I stop the Karaf service (or shutdown Karaf with Ctrl-D from the command line). There are probably easier ways to force Camel to have an inflight exchange at the point of shutdown but this is the way I've been testing it.
          Hide
          Achim Nierbeck added a comment -

          The current version in Trunk should be better now, please verify it.
          I tried to test it and it did seem to give more time for camel to do the stopping.
          Please verify it.

          Show
          Achim Nierbeck added a comment - The current version in Trunk should be better now, please verify it. I tried to test it and it did seem to give more time for camel to do the stopping. Please verify it.
          Hide
          Andreas Pieber added a comment -

          @achim: please also backport your patch to karaf-2.1.x; it cherry-picks clearly

          Show
          Andreas Pieber added a comment - @achim: please also backport your patch to karaf-2.1.x; it cherry-picks clearly
          Hide
          Guillaume Nodet added a comment -

          I'm testing the fix and will commit a slightly modified version of it today (and backport it too for the release).

          Show
          Guillaume Nodet added a comment - I'm testing the fix and will commit a slightly modified version of it today (and backport it too for the release).
          Hide
          Achim Nierbeck added a comment -

          I just wanted to have this fix verified before closing and applying to 2.1.x branch

          Show
          Achim Nierbeck added a comment - I just wanted to have this fix verified before closing and applying to 2.1.x branch
          Hide
          Guillaume Nodet added a comment -

          Yeah, but I've spotted a few other problems (like the wrapper does not know if someone calls osgi:shutdown).
          I'm commiting right now.

          Show
          Guillaume Nodet added a comment - Yeah, but I've spotted a few other problems (like the wrapper does not know if someone calls osgi:shutdown). I'm commiting right now.
          Hide
          Guillaume Nodet added a comment -

          The wrapper shutdown seems to work great now in all the tests i've done.

          Show
          Guillaume Nodet added a comment - The wrapper shutdown seems to work great now in all the tests i've done.
          Hide
          Achim Nierbeck added a comment -

          Just a test camel route,
          to test camel 2.5 needs to be installed. But make sure Spring 3.0.4 is installed.

          Show
          Achim Nierbeck added a comment - Just a test camel route, to test camel 2.5 needs to be installed. But make sure Spring 3.0.4 is installed.
          Hide
          Bengt Rodehav added a comment -

          Sorry for taking so long to test this. I had great problems getting my application to run on Karaf 2.1.99-SNAPSHOT due to dependencies to different versions of Camel. Not sure why but I managed to get my camel routes up at last.

          It now works perfectly fine. When I stop the Windows service, it waits for my Camel route to finish first. Exactly the way I wanted it to work.

          Thanks,

          /Bengt

          Show
          Bengt Rodehav added a comment - Sorry for taking so long to test this. I had great problems getting my application to run on Karaf 2.1.99-SNAPSHOT due to dependencies to different versions of Camel. Not sure why but I managed to get my camel routes up at last. It now works perfectly fine. When I stop the Windows service, it waits for my Camel route to finish first. Exactly the way I wanted it to work. Thanks, /Bengt
          Hide
          Guillaume Nodet added a comment -

          The call to framework.stop() doesn't really wait for the framework to be shut down, but can block until a lock is acquired, so using a different thread makes sense.
          Note that while testing, I found out that spring-dm has a built-in 10s timeout on shutting down spring-dm bundles, so the osgi framework won't wait for the camel route to be gracefully shutdown.

          Show
          Guillaume Nodet added a comment - The call to framework.stop() doesn't really wait for the framework to be shut down, but can block until a lock is acquired, so using a different thread makes sense. Note that while testing, I found out that spring-dm has a built-in 10s timeout on shutting down spring-dm bundles, so the osgi framework won't wait for the camel route to be gracefully shutdown.

            People

            • Assignee:
              Achim Nierbeck
              Reporter:
              Bengt Rodehav
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development