Karaf
  1. Karaf
  2. KARAF-910

Race between FeatureService and ConfigAdmin for resolving mvn: URLs?

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Blocker Blocker
    • Resolution: Fixed
    • Affects Version/s: 2.2.2
    • Fix Version/s: 3.0.0
    • Component/s: karaf-feature
    • Labels:
      None
    • Environment:

      Talend Service Factory 2.4.2.0 (includes Karaf 2.2.2) with the Equinox core.

      Description

      I have an intermittent problem where my custom features.xml cannot be resolved. I use a tweaked etc/org.ops4j.pax.url.mvn.cfg file so the features.xml file is never present in $HOME/.m2/repo but is instead is resolved to local repo relative to the app:

      org.ops4j.pax.url.mvn.defaultRepositories=file:$

      {karaf.base}/system@snapshots,\
      file:${karaf.home}/${karaf.default.repository}@snapshots,\
      file:${karaf.base}

      /../../../.env/.m2/repo@snapshots

      Sometimes when I start Karaf, I get this error (actual URL edited for privacy)

      karaf@tsf> 2011-09-30 09:23:09,760 WARN [FeaturesServiceImpl.java:924] Unable to add features repository mvn:<my-group-id>/<my-artifact-id>/<my-version>/xml/features at startup - o.a.k.f.i.FeaturesServiceImpl
      java.lang.RuntimeException: URL [mvn:<my-group-id>/<my-artifact-id>/<my-version>/xml/features] could not be resolved.
      at org.ops4j.pax.url.mvn.internal.Connection.getInputStream(Connection.java:195) [na:na]
      at org.ops4j.pax.url.mvn.internal.AetherBridgeConnection.getInputStream(AetherBridgeConnection.java:68) [na:na]
      at org.apache.karaf.features.internal.FeatureValidationUtil.validate(FeatureValidationUtil.java:49) [na:na]
      at org.apache.karaf.features.internal.FeaturesServiceImpl.validateRepository(FeaturesServiceImpl.java:199) [na:na]
      at org.apache.karaf.features.internal.FeaturesServiceImpl.internalAddRepository(FeaturesServiceImpl.java:210) [na:na]
      at org.apache.karaf.features.internal.FeaturesServiceImpl.start(FeaturesServiceImpl.java:922) [na:na]
      at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) [na:1.6.0_26]
      at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) [na:1.6.0_26]
      at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) [na:1.6.0_26]
      at java.lang.reflect.Method.invoke(Method.java:597) [na:1.6.0_26]
      at org.apache.aries.blueprint.utils.ReflectionUtils.invoke(ReflectionUtils.java:226) [org.apache.aries.blueprint:0.3.1]
      at org.apache.aries.blueprint.container.BeanRecipe.invoke(BeanRecipe.java:824) [org.apache.aries.blueprint:0.3.1]
      at org.apache.aries.blueprint.container.BeanRecipe.runBeanProcInit(BeanRecipe.java:636) [org.apache.aries.blueprint:0.3.1]
      at org.apache.aries.blueprint.container.BeanRecipe.internalCreate(BeanRecipe.java:724) [org.apache.aries.blueprint:0.3.1]
      at org.apache.aries.blueprint.di.AbstractRecipe.create(AbstractRecipe.java:64) [org.apache.aries.blueprint:0.3.1]
      at org.apache.aries.blueprint.container.BlueprintRepository.createInstances(BlueprintRepository.java:219) [org.apache.aries.blueprint:0.3.1]
      at org.apache.aries.blueprint.container.BlueprintRepository.createAll(BlueprintRepository.java:147) [org.apache.aries.blueprint:0.3.1]
      at org.apache.aries.blueprint.container.BlueprintContainerImpl.instantiateEagerComponents(BlueprintContainerImpl.java:640) [org.apache.aries.blueprint:0.3.1]
      at org.apache.aries.blueprint.container.BlueprintContainerImpl.doRun(BlueprintContainerImpl.java:331) [org.apache.aries.blueprint:0.3.1]
      at org.apache.aries.blueprint.container.BlueprintContainerImpl.run(BlueprintContainerImpl.java:227) [org.apache.aries.blueprint:0.3.1]
      at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) [na:1.6.0_26]
      at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) [na:1.6.0_26]
      at java.util.concurrent.FutureTask.run(FutureTask.java:138) [na:1.6.0_26]
      at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:98) [na:1.6.0_26]
      at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:206) [na:1.6.0_26]
      at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) [na:1.6.0_26]
      at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) [na:1.6.0_26]
      at java.lang.Thread.run(Thread.java:662) [na:1.6.0_26]

      If I put a breakpoint in org.ops4j.pax.url.mvn.internal.Connection.getInputStream(), I can see that when it fails m_configuration.getDefaultRepositories() contains one repo ($HOME/.m2/repo) and when it succeeds m_configuration.getDefaultRepositories() contains the three repos I've specified in etc/org.ops4j.pax.url.mvn.cfg.

      I interpret that to mean that sometimes the features resolution happens before Felix reads the files in etc/ and sometimes the features load afterward. Mostly I'm using the same startlevels as Karaf – my startup.properties file is identical to the following except for a few additions I made.

      http://svn.apache.org/viewvc/karaf/trunk/assemblies/apache-karaf/src/main/filtered-resources/etc/startup.properties?revision=1176017&view=markup

      1. KARAF-910-ops4j-2.diff
        7 kB
        David Jencks
      2. KARAF-910-ops4j.diff
        5 kB
        David Jencks

        Activity

        Hide
        Jean-Baptiste Onofré added a comment -

        We don't have the problem with wrap for instance, as the pax-url-wrap is in startup.properties.

        The problem is more for war for instance, but I think that KARAF-608 should address this issue.

        Show
        Jean-Baptiste Onofré added a comment - We don't have the problem with wrap for instance, as the pax-url-wrap is in startup.properties. The problem is more for war for instance, but I think that KARAF-608 should address this issue.
        Hide
        Chris Dolan added a comment -

        In my opinion, the PAX-URL change is only half of the solution. There's still a problem if an URL handler is not ready in time, so FeatureService gets a MalformedUrlException and does not retry the feature when the URL handler is actually ready.

        One possible workaround for that problem is in KARAF-918, where you would include an explicit prereq on the URL handler in the feature.

        Show
        Chris Dolan added a comment - In my opinion, the PAX-URL change is only half of the solution. There's still a problem if an URL handler is not ready in time, so FeatureService gets a MalformedUrlException and does not retry the feature when the URL handler is actually ready. One possible workaround for that problem is in KARAF-918 , where you would include an explicit prereq on the URL handler in the feature.
        Hide
        Jean-Baptiste Onofré added a comment -

        I confirm that pax-url 1.4-RC1 includes the David's patch. We are preparing a pax-url 1.4-RC2 including some others fixes.

        Anyway, I close this issue as it sounds fix to me.

        Show
        Jean-Baptiste Onofré added a comment - I confirm that pax-url 1.4-RC1 includes the David's patch. We are preparing a pax-url 1.4-RC2 including some others fixes. Anyway, I close this issue as it sounds fix to me.
        Hide
        Chris Dolan added a comment -

        Just a note for visitors for the future: I created a nasty hack in my project that works around this problem reliably.

        I create a small bundle with the code below and start it at level 12 in startup.properties (right after configadmin and fileinstall). It works by blocking the OSGi framework thread until the configadmin thread can catch up. While blocking, it polls for a known private mvn: URL that must exist. It assumes that the version of the hack bundle is the same as the desired resource.

        public class Activator implements BundleActivator {
            public static final long ITERATION_MILLIS = 250L;
            public static final long MAX_WAIT_MILLIS = 10000L;
        
            @Override
            public void start(BundleContext context) throws InterruptedException {
                long now = System.currentTimeMillis();
                long end = now + MAX_WAIT_MILLIS;
                while (now < end) {
                    try {
                        Bundle bundle = context.getBundle();
                        Version version = bundle.getVersion();
                        String mvnVersion = version.toString().replace(".SNAPSHOT", "-SNAPSHOT");
                        URL url = new URL("mvn:com.example/example/" + mvnVersion + "/xml/features");
                        url.openStream().close();
                        System.out.println("mvn cfg is ready - " + new Date(now));
                        break;
                    } catch (Exception e) {
                        System.out.println("Waiting for mvn cfg - " + new Date(now) + " - " + e);
                    }
                    Thread.sleep(ITERATION_MILLIS);
                    now = System.currentTimeMillis();
                }
            }
        
            @Override
            public void stop(BundleContext context) {
            }
        }
        

        At runtime, I get this output:

        Waiting for mvn cfg - Tue Feb 28 08:17:00 CST 2012 - java.lang.RuntimeException: URL [mvn:com.example/example/1.0.1-SNAPSHOT/xml/features] could not be resolved.
        Waiting for mvn cfg - Tue Feb 28 08:17:00 CST 2012 - java.lang.RuntimeException: URL [mvn:com.example/example/1.0.1-SNAPSHOT/xml/features] could not be resolved.
        Waiting for mvn cfg - Tue Feb 28 08:17:01 CST 2012 - java.lang.RuntimeException: URL [mvn:com.example/example/1.0.1-SNAPSHOT/xml/features] could not be resolved.
        mvn cfg is ready - Tue Feb 28 08:17:01 CST 2012
        
        Show
        Chris Dolan added a comment - Just a note for visitors for the future: I created a nasty hack in my project that works around this problem reliably. I create a small bundle with the code below and start it at level 12 in startup.properties (right after configadmin and fileinstall). It works by blocking the OSGi framework thread until the configadmin thread can catch up. While blocking, it polls for a known private mvn: URL that must exist. It assumes that the version of the hack bundle is the same as the desired resource. public class Activator implements BundleActivator { public static final long ITERATION_MILLIS = 250L; public static final long MAX_WAIT_MILLIS = 10000L; @Override public void start(BundleContext context) throws InterruptedException { long now = System.currentTimeMillis(); long end = now + MAX_WAIT_MILLIS; while (now < end) { try { Bundle bundle = context.getBundle(); Version version = bundle.getVersion(); String mvnVersion = version.toString().replace(".SNAPSHOT", "-SNAPSHOT"); URL url = new URL("mvn:com.example/example/" + mvnVersion + "/xml/features"); url.openStream().close(); System.out.println("mvn cfg is ready - " + new Date(now)); break; } catch (Exception e) { System.out.println("Waiting for mvn cfg - " + new Date(now) + " - " + e); } Thread.sleep(ITERATION_MILLIS); now = System.currentTimeMillis(); } } @Override public void stop(BundleContext context) { } } At runtime, I get this output: Waiting for mvn cfg - Tue Feb 28 08:17:00 CST 2012 - java.lang.RuntimeException: URL [mvn:com.example/example/1.0.1-SNAPSHOT/xml/features] could not be resolved. Waiting for mvn cfg - Tue Feb 28 08:17:00 CST 2012 - java.lang.RuntimeException: URL [mvn:com.example/example/1.0.1-SNAPSHOT/xml/features] could not be resolved. Waiting for mvn cfg - Tue Feb 28 08:17:01 CST 2012 - java.lang.RuntimeException: URL [mvn:com.example/example/1.0.1-SNAPSHOT/xml/features] could not be resolved. mvn cfg is ready - Tue Feb 28 08:17:01 CST 2012
        Hide
        David Jencks added a comment -

        I've applied all my ops4j pax-url-aether patches and also the necessary karaf tweaks to trunk (r1229812). I also changed all use of pax-url-mvn to pax-url-aether. Pax-url-* versions are now 1.4-SNAPSHOT.

        Show
        David Jencks added a comment - I've applied all my ops4j pax-url-aether patches and also the necessary karaf tweaks to trunk (r1229812). I also changed all use of pax-url-mvn to pax-url-aether. Pax-url-* versions are now 1.4-SNAPSHOT.
        Hide
        Chris Dolan added a comment -

        Added http://team.ops4j.org/browse/PAXURL-147 to track this on the ops4j side.

        I find the fallback configuration point a little confusing, but if I understand David's explanation correctly then I like this patch. I think it resolves my concerns.

        Show
        Chris Dolan added a comment - Added http://team.ops4j.org/browse/PAXURL-147 to track this on the ops4j side. I find the fallback configuration point a little confusing, but if I understand David's explanation correctly then I like this patch. I think it resolves my concerns.
        Hide
        David Jencks added a comment -

        Again, for ops4j.

        This version depends on a flag in framework properties org.ops4j.pax.url.mvn.require.config.admin.config

        If present, it waits for a config admin based configuration (that should not include this flag). So including this flag in framework properties should prevent the mvn url handler from registering until the config admin config shows up.

        The other url handlers don't check for validity so they will work with framework property configuration as they do now.

        Another difference is that to make this work you need to avoid fallback configuration. Either you use config admin config or framework property config but you can't mix them. Otherwise the flag will always be seen and prevent any configuration at all.

        Show
        David Jencks added a comment - Again, for ops4j. This version depends on a flag in framework properties org.ops4j.pax.url.mvn.require.config.admin.config If present, it waits for a config admin based configuration (that should not include this flag). So including this flag in framework properties should prevent the mvn url handler from registering until the config admin config shows up. The other url handlers don't check for validity so they will work with framework property configuration as they do now. Another difference is that to make this work you need to avoid fallback configuration. Either you use config admin config or framework property config but you can't mix them. Otherwise the flag will always be seen and prevent any configuration at all.
        Hide
        Chris Dolan added a comment -

        David, that patch looks good for pax-url-mvn, but I think refusing to register the URL handler without a config is wrong for some of the other URL handlers (e.g. pax-url-wrap or pax-url-war). Those usually do not need any configuration.

        Show
        Chris Dolan added a comment - David, that patch looks good for pax-url-mvn, but I think refusing to register the URL handler without a config is wrong for some of the other URL handlers (e.g. pax-url-wrap or pax-url-war). Those usually do not need any configuration.
        Hide
        David Jencks added a comment -

        This is asl licensed and can be included in ops4j. If someone figures out a place in apache that's fine too.

        Show
        David Jencks added a comment - This is asl licensed and can be included in ops4j. If someone figures out a place in apache that's fine too.
        Hide
        David Jencks added a comment -

        I looked into the pax url stuff and came to a similar conclusion. Things might be a little more complicated than you indicate since it is possible to configure the url handlers using framework properties as well as config admin. (IIUC in karaf system properties get added to framework properties so you can theoretically configure the url handler using system properties right now). I'll attach the POC ops4j code I came up with to start with, I didn't figure out how to test it.

        Show
        David Jencks added a comment - I looked into the pax url stuff and came to a similar conclusion. Things might be a little more complicated than you indicate since it is possible to configure the url handlers using framework properties as well as config admin. (IIUC in karaf system properties get added to framework properties so you can theoretically configure the url handler using system properties right now). I'll attach the POC ops4j code I came up with to start with, I didn't figure out how to test it.
        Hide
        Chris Dolan added a comment -

        Here is a proposed multi-part solution to this race: 1) change pax-url-common to optionally delay registering an URL handler until the service is configured, 2) change FeaturesValidationUtil to treat MalformedUrlException specially.

        The class org.ops4j.pax.url.commons.handler.HandlerActivator currently unconditionally registers the URL handler in its start() method. I propose that it should instead check a flag (perhaps a boolean Java system property like "$

        {pid}

        .configurationRequired") and if that flag is true the URL handler should be registered only after ManagedService.updated() is invoked with a non-null value. This flag is the most unpleasant part of this proposal: we need to preserve the existing behavior that the handler will be registered with default configuration, but we also need a way to indicate that the configuration is required. This reminds me of the Declarative Services feature `configuration-policy="require"` which is exactly the desired behavior here.

        The malformed URL is even harder to solve, I think. The FeaturesServiceImpl needs to detect the URL failure, and retry validation at some later time when the URL handler is registered. That "when" can be satisfied by registering a listener for the registration of an URLStreamHandlerService with bundle the property "url.handler.protocol" including the desired URL scheme. Again, thinking in terms of Declarative Services this is like adding a `<reference interface="org.osgi.service.url.URLStreamHandlerService" cardinality="1..1" filter="(url.handler.protocol=mvn)"/>`

        Open questions:

        • is there a better way to signal to pax-url-common that the configuration is required?
        • should FeaturesServiceImpl have a timeout while waiting for the URL handler?
        • should FeaturesServiceImpl block while waiting for the URL handler?
        • if FeaturesServiceImpl asynchronously waits for the URL handler, how will a user who loads the feature from the Karaf command line know about real validation failures?
        Show
        Chris Dolan added a comment - Here is a proposed multi-part solution to this race: 1) change pax-url-common to optionally delay registering an URL handler until the service is configured, 2) change FeaturesValidationUtil to treat MalformedUrlException specially. The class org.ops4j.pax.url.commons.handler.HandlerActivator currently unconditionally registers the URL handler in its start() method. I propose that it should instead check a flag (perhaps a boolean Java system property like "$ {pid} .configurationRequired") and if that flag is true the URL handler should be registered only after ManagedService.updated() is invoked with a non-null value. This flag is the most unpleasant part of this proposal: we need to preserve the existing behavior that the handler will be registered with default configuration, but we also need a way to indicate that the configuration is required. This reminds me of the Declarative Services feature `configuration-policy="require"` which is exactly the desired behavior here. The malformed URL is even harder to solve, I think. The FeaturesServiceImpl needs to detect the URL failure, and retry validation at some later time when the URL handler is registered. That "when" can be satisfied by registering a listener for the registration of an URLStreamHandlerService with bundle the property "url.handler.protocol" including the desired URL scheme. Again, thinking in terms of Declarative Services this is like adding a `<reference interface="org.osgi.service.url.URLStreamHandlerService" cardinality="1..1" filter="(url.handler.protocol=mvn)"/>` Open questions: is there a better way to signal to pax-url-common that the configuration is required? should FeaturesServiceImpl have a timeout while waiting for the URL handler? should FeaturesServiceImpl block while waiting for the URL handler? if FeaturesServiceImpl asynchronously waits for the URL handler, how will a user who loads the feature from the Karaf command line know about real validation failures?
        Hide
        David Jencks added a comment -

        boot features in trunk are unusable with this bug. You typically get the mvn url handler downloading stuff from remote repos without checking either the karaf repo or your local maven repo, so locally built snapshots are almost never used.

        Show
        David Jencks added a comment - boot features in trunk are unusable with this bug. You typically get the mvn url handler downloading stuff from remote repos without checking either the karaf repo or your local maven repo, so locally built snapshots are almost never used.
        Hide
        David Jencks added a comment -

        I looked at this a little more. I think we may need to do several things to completely solve this. Even if config admin is running before we start the mvn url handler, and the config is available, and we follow Neils advice and change pax mvn url to wait to register the mvn url handler until the config is installed, karaf may have started all the startup bundles and be trying to start bundles with mvn urls before config admin gets around to applying the config and making the mvn url handler available. I think we need to provide karaf with some way to wait until the mvn url handler is correctly configured and use it before we process the boot features.

        Show
        David Jencks added a comment - I looked at this a little more. I think we may need to do several things to completely solve this. Even if config admin is running before we start the mvn url handler, and the config is available, and we follow Neils advice and change pax mvn url to wait to register the mvn url handler until the config is installed, karaf may have started all the startup bundles and be trying to start bundles with mvn urls before config admin gets around to applying the config and making the mvn url handler available. I think we need to provide karaf with some way to wait until the mvn url handler is correctly configured and use it before we process the boot features.
        Hide
        Chris Dolan added a comment -

        David, thanks but that's definitely not the solution in my case. A breakpoint tells me that the cfg file is already parsed and is in memory. The issue is that it's applied asynchronously, on a different thread from the activator. Meanwhile, the activator thread is starting other dependent bundles. noInitialDelay could narrow the race window, but it can't solve it singlehandedly.

        Show
        Chris Dolan added a comment - David, thanks but that's definitely not the solution in my case. A breakpoint tells me that the cfg file is already parsed and is in memory. The issue is that it's applied asynchronously, on a different thread from the activator. Meanwhile, the activator thread is starting other dependent bundles. noInitialDelay could narrow the race window, but it can't solve it singlehandedly.
        Hide
        David Jencks added a comment -

        Some preliminary experiments show that adding

        felix.fileinstall.noInitialDelay=true

        to etc/custom.properties may fix the problems I was having with the mvn url handler related to this issue. It will take a day or two to be sure as my locally built snapshots expire.

        Show
        David Jencks added a comment - Some preliminary experiments show that adding felix.fileinstall.noInitialDelay=true to etc/custom.properties may fix the problems I was having with the mvn url handler related to this issue. It will take a day or two to be sure as my locally built snapshots expire.
        Hide
        Chris Dolan added a comment -

        Correct, I added my custom features.xml to the bootFeatures list.

        Show
        Chris Dolan added a comment - Correct, I added my custom features.xml to the bootFeatures list.
        Hide
        Jean-Baptiste Onofré added a comment -

        Chris, is your feature part of org.apache.karaf.features.cfg file, in the bootFeatures ?

        As the fix will certainly include some refactoring, I prefer to postpone to at least 2.2.5.

        Show
        Jean-Baptiste Onofré added a comment - Chris, is your feature part of org.apache.karaf.features.cfg file, in the bootFeatures ? As the fix will certainly include some refactoring, I prefer to postpone to at least 2.2.5.
        Hide
        Chris Dolan added a comment -

        Thanks much Jean-Baptiste. Let me know if I can help in any way. I've now tied this root cause to four separate intermittent boot problems in my application (Pax-Web-Jetty, Pax-Logging-Service, Pax-URL-Maven and a private service), so this has become my #1 bug to watch. (But the mvn: problem is the only one that that's actually blocking anything – the others just cause warnings or wasted performance or a couple of dropped log messages due to configuring subsystems twice).

        I'm very curious what your proposed solution will be, because this looks like a fundamental limitation of the ManagedService API to me.

        Show
        Chris Dolan added a comment - Thanks much Jean-Baptiste. Let me know if I can help in any way. I've now tied this root cause to four separate intermittent boot problems in my application (Pax-Web-Jetty, Pax-Logging-Service, Pax-URL-Maven and a private service), so this has become my #1 bug to watch. (But the mvn: problem is the only one that that's actually blocking anything – the others just cause warnings or wasted performance or a couple of dropped log messages due to configuring subsystems twice). I'm very curious what your proposed solution will be, because this looks like a fundamental limitation of the ManagedService API to me.
        Hide
        Jean-Baptiste Onofré added a comment -

        Thanks for this report Chris. We are going to try to include a fix in Karaf 2.2.4.

        Show
        Jean-Baptiste Onofré added a comment - Thanks for this report Chris. We are going to try to include a fix in Karaf 2.2.4.
        Hide
        Chris Dolan added a comment -

        It looks like this race may be an inherent by-product of the ManagedService contract (see Neil Bartlett's answer to http://stackoverflow.com/questions/7616295/how-can-you-get-managedservice-configuration-immediately). If so, then it may be necessary to alter org.ops4j.pax.url.mvn to handle this scenario, or else retry the features.xml resolution. It's not clear if Karaf or Pax is the best place to solve that.

        Show
        Chris Dolan added a comment - It looks like this race may be an inherent by-product of the ManagedService contract (see Neil Bartlett's answer to http://stackoverflow.com/questions/7616295/how-can-you-get-managedservice-configuration-immediately ). If so, then it may be necessary to alter org.ops4j.pax.url.mvn to handle this scenario, or else retry the features.xml resolution. It's not clear if Karaf or Pax is the best place to solve that.

          People

          • Assignee:
            Jean-Baptiste Onofré
            Reporter:
            Chris Dolan
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development