Whirr
  1. Whirr
  2. WHIRR-221

Optionally control the order of starting services

    Details

    • Type: New Feature New Feature
    • Status: Patch Available
    • Priority: Critical Critical
    • Resolution: Unresolved
    • Affects Version/s: 0.8.0
    • Fix Version/s: 0.9.0
    • Component/s: core, documentation
    • Labels:
      None

      Description

      As Lars sugested in WHIRR-170:

      The user should "be able to optionally control the order (services start). This could be role based and specified like so

      whirr.role-order=zk,nn+jt,dn+tt,hbase-master,hbase-regionserver
      

      If not specified the system should make any effort to start the services as quickly as possible, for example in multiple threads. In other words, when the role-order is not given no guarantee about order can be given."

      1. WHIRR-221.patch
        97 kB
        David Alves
      2. WHIRR-221.patch
        97 kB
        David Alves
      3. WHIRR-221-partial-services-update.patch
        12 kB
        Andrei Savu
      4. WHIRR-221.patch
        84 kB
        David Alves
      5. WHIRR-221-karaf-feature-supplement_txt.patch
        2 kB
        Ioannis Canellos
      6. error-karaf.txt
        176 kB
        David Alves
      7. WHIRR-221.patch
        82 kB
        David Alves
      8. WHIRR-221.patch
        80 kB
        David Alves
      9. WHIRR-221.patch
        65 kB
        David Alves
      10. WHIRR-221.patch
        57 kB
        David Alves
      11. WHIRR-221-v3.patch
        16 kB
        David Alves
      12. WHIRR-221-v1.patch
        13 kB
        David Alves
      13. WHIRR-221.patch
        13 kB
        David Alves

        Issue Links

          Activity

          Hide
          Ioannis Canellos added a comment -

          I attempted to create the required bundles at servicemix bundles in order to use them here. They have been released but they have issues. In case you want to commit the patch before the issues are fixed, you can add the following to the whirr feature:

          <bundle dependency="true">wrap:mvn:net.sf.jung/jung-api/$

          {jung.version}$Bundle-SymbolicName=jung-api</bundle>
          <bundle dependency="true">wrap:mvn:net.sf.jung/jung-graph-impl/${jung.version}

          $Fragment-Host=jung-api</bundle>

          So that we can auto-wrap the jung jars.

          Show
          Ioannis Canellos added a comment - I attempted to create the required bundles at servicemix bundles in order to use them here. They have been released but they have issues. In case you want to commit the patch before the issues are fixed, you can add the following to the whirr feature: <bundle dependency="true">wrap:mvn:net.sf.jung/jung-api/$ {jung.version}$Bundle-SymbolicName=jung-api</bundle> <bundle dependency="true">wrap:mvn:net.sf.jung/jung-graph-impl/${jung.version} $Fragment-Host=jung-api</bundle> So that we can auto-wrap the jung jars.
          Hide
          Andrei Savu added a comment -

          ps: I think we will end-up resolving a lot of conflicts ...

          Show
          Andrei Savu added a comment - ps: I think we will end-up resolving a lot of conflicts ...
          Hide
          Andrei Savu added a comment - - edited

          David it would be great if we could manage to commit this over the next few days. There are other large changes coming soon (e.g. WHIRR-504)

          Show
          Andrei Savu added a comment - - edited David it would be great if we could manage to commit this over the next few days. There are other large changes coming soon (e.g. WHIRR-504 )
          Hide
          David Alves added a comment -

          Let me try and clear the method names as per your suggestions and write a better javadoc and lets see if it makes more sense afterwards.

          Show
          David Alves added a comment - Let me try and clear the method names as per your suggestions and write a better javadoc and lets see if it makes more sense afterwards.
          Hide
          David Alves added a comment -

          Tom: Two things:
          1- In my opinion getDependencies() should naturally return a set of the the same type as getRole() since the latter is the name of the current service and the former the names of the dependencies.
          2 - the method getOnlineDelayMillis() returns the delay that the implementor handler takes to become online, after its startup script has returned successfully, not the time its dependencies take to become online.

          Show
          David Alves added a comment - Tom: Two things: 1- In my opinion getDependencies() should naturally return a set of the the same type as getRole() since the latter is the name of the current service and the former the names of the dependencies. 2 - the method getOnlineDelayMillis() returns the delay that the implementor handler takes to become online, after its startup script has returned successfully, not the time its dependencies take to become online.
          Hide
          Tom White added a comment -

          Another idea: change getDependencies() to return a set of ServiceDependency object (not sure about the name), and that can have the delay method on it.

          Show
          Tom White added a comment - Another idea: change getDependencies() to return a set of ServiceDependency object (not sure about the name), and that can have the delay method on it.
          Hide
          David Alves added a comment -

          +1, I also agree. Given the choice polling for the service is a much better choice. Still think the option to introduce a delay will be useful.

          Show
          David Alves added a comment - +1, I also agree. Given the choice polling for the service is a much better choice. Still think the option to introduce a delay will be useful.
          Hide
          Andrei Savu added a comment -

          Also, adding delays in will slow things down - we'd rather have the polling etc done by the service scripts

          I agree but it's much more difficult to do that now. I would make it a priority for 0.9.0. What do you think?

          Show
          Andrei Savu added a comment - Also, adding delays in will slow things down - we'd rather have the polling etc done by the service scripts I agree but it's much more difficult to do that now. I would make it a priority for 0.9.0. What do you think?
          Hide
          David Alves added a comment -

          getRequiredRoles() -> getDependencies(), since that's the phrasing used elsewhere.

          ok

          getOnlineDelayMillis() feels out of place here. It talks about stages, but it's not clear how an event handler maps to a stage. Also, adding delays in will slow things down - we'd rather have the polling etc done by the service scripts. Can we leave this out of the interface and put it in the scripts, if it's needed at all?

          Perhaps best here is to make the javadoc clearer. This only refers to the time the service is expected to be online after the start scripts returns successfully. It need not be implemented by every handler since it is implemented in ClusterActionHandlerSupport. As for its mapping to stages while the user (the implementor of ClusterActionHandler) need not now this but a stage (a set of roles that do not inter-depend and therefore can be started together) does not end until the largest of its delays passes.

          Will try to do something about the most severe formatting problems.

          Show
          David Alves added a comment - getRequiredRoles() -> getDependencies(), since that's the phrasing used elsewhere. ok getOnlineDelayMillis() feels out of place here. It talks about stages, but it's not clear how an event handler maps to a stage. Also, adding delays in will slow things down - we'd rather have the polling etc done by the service scripts. Can we leave this out of the interface and put it in the scripts, if it's needed at all? Perhaps best here is to make the javadoc clearer. This only refers to the time the service is expected to be online after the start scripts returns successfully. It need not be implemented by every handler since it is implemented in ClusterActionHandlerSupport. As for its mapping to stages while the user (the implementor of ClusterActionHandler) need not now this but a stage (a set of roles that do not inter-depend and therefore can be started together) does not end until the largest of its delays passes. Will try to do something about the most severe formatting problems.
          Hide
          Tom White added a comment -

          Overall this looks good to me. However, it's hard to read the patch since a lot of formatting has changed. Can you regenerate the patch to only have the real changes please? (E.g. import order changes are not needed.)

          • getRequiredRoles() -> getDependencies(), since that's the phrasing used elsewhere.
          • getOnlineDelayMillis() feels out of place here. It talks about stages, but it's not clear how an event handler maps to a stage. Also, adding delays in will slow things down - we'd rather have the polling etc done by the service scripts. Can we leave this out of the interface and put it in the scripts, if it's needed at all?
          • The tests look great.
          Show
          Tom White added a comment - Overall this looks good to me. However, it's hard to read the patch since a lot of formatting has changed. Can you regenerate the patch to only have the real changes please? (E.g. import order changes are not needed.) getRequiredRoles() -> getDependencies(), since that's the phrasing used elsewhere. getOnlineDelayMillis() feels out of place here. It talks about stages, but it's not clear how an event handler maps to a stage. Also, adding delays in will slow things down - we'd rather have the polling etc done by the service scripts. Can we leave this out of the interface and put it in the scripts, if it's needed at all? The tests look great.
          Hide
          David Alves added a comment -

          removed wrongly committed sysouts

          Show
          David Alves added a comment - removed wrongly committed sysouts
          Hide
          David Alves added a comment -
          • changed the way the scripts are named so that they reflect only the roles executed in that stage (vs all the roles in the template)
          • added fail fast on missing dependencies (both for missing roles from configuration and missing handlers for roles)

          Added more tests:
          Unit - test exception thrown if dependency not configured in any template
          Unit - test exception thrown if no handler for dependency
          DryRun- test scripts are executed in the correct order (single node)
          DryRun- test scripts are executed in the correct order (two nodes)

          Show
          David Alves added a comment - changed the way the scripts are named so that they reflect only the roles executed in that stage (vs all the roles in the template) added fail fast on missing dependencies (both for missing roles from configuration and missing handlers for roles) Added more tests: Unit - test exception thrown if dependency not configured in any template Unit - test exception thrown if no handler for dependency DryRun- test scripts are executed in the correct order (single node) DryRun- test scripts are executed in the correct order (two nodes)
          Hide
          David Alves added a comment -

          Ioannis: In the lastest patch the dependency on jung-algorithms doesn't exist anymore (and I removed it from feature.xml), was the problem with this dependency only?

          Show
          David Alves added a comment - Ioannis: In the lastest patch the dependency on jung-algorithms doesn't exist anymore (and I removed it from feature.xml), was the problem with this dependency only?
          Hide
          Ioannis Canellos added a comment -

          Even though with the committed patch whirr installs properly inside OSGi, it fails to properly launch a cluster, do to the way that jung is structured.

          As a solution I have started creating an OSGi uber bundle for jung (I can't see an other way of fixing it) and its dependencies: SMX4-1090, SMX4-1091, SMX4-1092.

          The thing that I need in order to complete it, is if there is a runtime dependency on jung-alogorithms. It appears as a dependency, but it seems that whirr is working even without it.

          Show
          Ioannis Canellos added a comment - Even though with the committed patch whirr installs properly inside OSGi, it fails to properly launch a cluster, do to the way that jung is structured. As a solution I have started creating an OSGi uber bundle for jung (I can't see an other way of fixing it) and its dependencies: SMX4-1090 , SMX4-1091 , SMX4-1092 . The thing that I need in order to complete it, is if there is a runtime dependency on jung-alogorithms. It appears as a dependency, but it seems that whirr is working even without it.
          Hide
          Andrei Savu added a comment -

          I can confirm that services are started in the right order for Hadoop using trunk+221+294. It feels a bit more robust already.

          Show
          Andrei Savu added a comment - I can confirm that services are started in the right order for Hadoop using trunk+221+294. It feels a bit more robust already.
          Hide
          Andrei Savu added a comment -

          I have decided to build WHIRR-294 on top of this one.

          Show
          Andrei Savu added a comment - I have decided to build WHIRR-294 on top of this one.
          Hide
          Andrei Savu added a comment -

          David - your patch looks good to me but I would like to wait for someone else to review because it's big change (affecting the core). Karel? Adrian? Tom?

          Show
          Andrei Savu added a comment - David - your patch looks good to me but I would like to wait for someone else to review because it's big change (affecting the core). Karel? Adrian? Tom?
          Hide
          Andrei Savu added a comment -

          In this patch I have update Hadoop & HBase to declare the required roles - unfortunately they fail - I think we also need to update the scripts.

          Show
          Andrei Savu added a comment - In this patch I have update Hadoop & HBase to declare the required roles - unfortunately they fail - I think we also need to update the scripts.
          Hide
          David Alves added a comment -
          • included OSGization of new deps (thanks Ioannis)
          • updated to latest trunk
          • changed BootstrapClusterAction to use executors from ClusterAction (thanks Andrei)

          maven build is working and all unit tests pass.

          please review!

          Show
          David Alves added a comment - included OSGization of new deps (thanks Ioannis) updated to latest trunk changed BootstrapClusterAction to use executors from ClusterAction (thanks Andrei) maven build is working and all unit tests pass. please review!
          Hide
          David Alves added a comment -

          thanks Ioannis

          This procedure should be documented somewhere since from now on no dependency can be added to whirr without going trough it (or letting the build fail).
          Maybe a note on the wiki?

          Show
          David Alves added a comment - thanks Ioannis This procedure should be documented somewhere since from now on no dependency can be added to whirr without going trough it (or letting the build fail). Maybe a note on the wiki?
          Hide
          Ioannis Canellos added a comment -

          I am attaching a patch which updates karaf feature descriptor, with jung dependency. This patch makes use of wrap protocol for OSGifying bundles on the fly. In the mean time I will work on creating proper jung bundles.

          Show
          Ioannis Canellos added a comment - I am attaching a patch which updates karaf feature descriptor, with jung dependency. This patch makes use of wrap protocol for OSGifying bundles on the fly. In the mean time I will work on creating proper jung bundles.
          Hide
          Andrei Savu added a comment -

          long ago when I used to use OSGi we had to bundlize every lib or include the libs in a uber-bundle. Is that still the case?

          I think so. Ioannis?

          Show
          Andrei Savu added a comment - long ago when I used to use OSGi we had to bundlize every lib or include the libs in a uber-bundle. Is that still the case? I think so. Ioannis?
          Hide
          David Alves added a comment -

          strange that pax doesn't complain about that.

          ava.lang.Exception: Could not start bundle mvn:org.jclouds.driver/jclouds-sshj/1.3.1 in feature(s) jclouds-driver-sshj-1.3.1: Unresolved constraint in bundle jclouds-sshj [75]: Unable to resolve 75.0: missing requirement [75.0] package; (&(package=org.jclouds.compute.domain)(version>=1.3.1))
          at org.apache.karaf.features.internal.FeaturesServiceImpl.installFeatures(FeaturesServiceImpl.java:356)

          long ago when I used to use OSGi we had to bundlize every lib or include the libs in a uber-bundle. Is that still the case?

          Show
          David Alves added a comment - strange that pax doesn't complain about that. ava.lang.Exception: Could not start bundle mvn:org.jclouds.driver/jclouds-sshj/1.3.1 in feature(s) jclouds-driver-sshj-1.3.1: Unresolved constraint in bundle jclouds-sshj [75] : Unable to resolve 75.0: missing requirement [75.0] package; (&(package=org.jclouds.compute.domain)(version>=1.3.1)) at org.apache.karaf.features.internal.FeaturesServiceImpl.installFeatures(FeaturesServiceImpl.java:356) long ago when I used to use OSGi we had to bundlize every lib or include the libs in a uber-bundle. Is that still the case?
          Hide
          Andrei Savu added a comment -

          I think it's happening because of net.sf.jung - it needs osgification

          Show
          Andrei Savu added a comment - I think it's happening because of net.sf.jung - it needs osgification
          Hide
          Andrei Savu added a comment -

          I am unable to replicate it on trunk (5 runs - maven 3.0.3 r1075438 on mac osx with java 1.6.0_29) but I'm also seeing it after applying the patch (consistently).

          Show
          Andrei Savu added a comment - I am unable to replicate it on trunk (5 runs - maven 3.0.3 r1075438 on mac osx with java 1.6.0_29) but I'm also seeing it after applying the patch (consistently).
          Hide
          David Alves added a comment -

          Ok, the failure is transient in trunk, but not with the patch.

          Any help would be much appreciated.

          The first error that happens is:

          -------------------------------------------------------
           T E S T S
          -------------------------------------------------------
          Running org.apache.whirr.karaf.itest.WhirrInstallationTest
          SLF4J: Class path contains multiple SLF4J bindings.
          SLF4J: Found binding in [jar:file:/Users/dralves/.m2/repository/org/apache/karaf/org.apache.karaf.client/2.2.5/org.apache.karaf.client-2.2.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
          SLF4J: Found binding in [jar:file:/Users/dralves/.m2/repository/org/slf4j/slf4j-simple/1.6.1/slf4j-simple-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
          SLF4J: Found binding in [jar:file:/Users/dralves/.m2/repository/org/ops4j/pax/logging/pax-logging-api/1.6.3/pax-logging-api-1.6.3.jar!/org/slf4j/impl/StaticLoggerBinder.class]
          SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
          [org.ops4j.pax.url.mvn.internal.Connection] : Resolving [mvn:org.apache.karaf/apache-karaf/2.2.5/tar.gz]
          [org.ops4j.pax.url.mvn.internal.Connection] : Collecting versions from repository [file:/Users/dralves/.m2/repository/,releases=true,snapshots=true]
          [org.ops4j.pax.url.mvn.internal.Connection] :   Resolving exact version
          [org.ops4j.pax.url.mvn.internal.Connection] : Collecting versions from repository [http://osgi.sonatype.org/content/groups/pax-runner/,releases=true,snapshots=false]
          [org.ops4j.pax.url.mvn.internal.Connection] :   Resolving exact version
          [org.ops4j.pax.url.mvn.internal.Connection] : Collecting versions from repository [http://repo1.maven.org/maven2/,releases=true,snapshots=false]
          [org.ops4j.pax.url.mvn.internal.Connection] :   Resolving exact version
          [org.ops4j.pax.url.mvn.internal.Connection] : Collecting versions from repository [http://repository.ops4j.org/maven2/,releases=true,snapshots=false]
          [org.ops4j.pax.url.mvn.internal.Connection] :   Resolving exact version
          [org.ops4j.pax.url.mvn.internal.Connection] : Collecting versions from repository [http://repository.springsource.com/maven/bundles/release/,releases=true,snapshots=false]
          [org.ops4j.pax.url.mvn.internal.Connection] :   Resolving exact version
          [org.ops4j.pax.url.mvn.internal.Connection] : Collecting versions from repository [http://repository.springsource.com/maven/bundles/external/,releases=true,snapshots=false]
          [org.ops4j.pax.url.mvn.internal.Connection] :   Resolving exact version
                  __ __                  ____      
                 / //_/____ __________ _/ __/      
                / ,<  / __ `/ ___/ __ `/ /_        
               / /| |/ /_/ / /  / /_/ / __/        
              /_/ |_|\__,_/_/   \__,_/_/         
          
            Apache Karaf (2.2.5)
          
          Hit '<tab>' for a list of available commands
          and '[cmd] --help' for help on a specific command.
          Hit '<ctrl-d>' or 'osgi:shutdown' to shutdown Karaf.
          
          karaf@root> features:addurl mvn:org.apache.whirr.karaf/apache-whirr/0.8.0-SNAPSHOT/xml/features
          
          features:listurl
           Loaded   URI 
            true    file:/Users/dralves/WorkApplications/OSS/whirr-git/platforms/karaf/itests/target/paxexam/1d1d5b2a-6ec7-4913-a35a-dc42072a41b3/examfeatures.xml
            true    mvn:org.apache.whirr.karaf/apache-whirr/0.8.0-SNAPSHOT/xml/features
            true    mvn:org.apache.karaf.assemblies.features/enterprise/2.2.5/xml/features
            true    mvn:org.apache.karaf.assemblies.features/standard/2.2.5/xml/features
          
          features:list
          State         Version           Name                          Repository                 Description
          [installed  ] [2.3.0.M1       ] exam                          pax-exam-features-2.3.0.M1 
          [uninstalled] [0.8.0-SNAPSHOT ] whirr                         repo-0                     Apache Whirr Core
          [uninstalled] [0.8.0-SNAPSHOT ] whirr-cassandra               repo-0                     Apache Whirr Cassandra Service
          [uninstalled] [0.8.0-SNAPSHOT ] whirr-chef                    repo-0                     Apache Whirr Chef Service
          [uninstalled] [0.8.0-SNAPSHOT ] whirr-elasticsearch           repo-0                     Apache Whirr ElasticSearch Service
          [uninstalled] [0.8.0-SNAPSHOT ] whirr-ganglia                 repo-0                     Apache Whirr Ganglia Service
          [uninstalled] [0.8.0-SNAPSHOT ] whirr-hadoop                  repo-0                     Apache Whirr Hadoop Service
          [uninstalled] [0.8.0-SNAPSHOT ] whirr-hbase                   repo-0                     Apache Whirr Hbase Service
          [uninstalled] [0.8.0-SNAPSHOT ] whirr-hama                    repo-0                     Apache Whirr Hama Service
          [uninstalled] [0.8.0-SNAPSHOT ] whirr-mahout                  repo-0                     Apache Whirr Mahout Service
          [uninstalled] [0.8.0-SNAPSHOT ] whirr-puppet                  repo-0                     Apache Whirr Puppet Service
          [uninstalled] [0.8.0-SNAPSHOT ] whirr-voldemort               repo-0                     Apache Whirr Voldermort Service
          [uninstalled] [0.8.0-SNAPSHOT ] whirr-zookeeper               repo-0                     Apache Whirr Zookeeper Service
          [uninstalled] [0.3            ] transaction                   karaf-enterprise-2.2.5     OSGi Transaction Manager
          [uninstalled] [0.3            ] jpa                           karaf-enterprise-2.2.5     OSGi Persistence Container
          [uninstalled] [0.3            ] jndi                          karaf-enterprise-2.2.5     OSGi Service Registry JNDI access
          [uninstalled] [0.3            ] application-without-isolation karaf-enterprise-2.2.5     
          [uninstalled] [2.2.5          ] karaf-framework               karaf-2.2.5                
          [uninstalled] [2.5.6.SEC02    ] spring                        karaf-2.2.5                
          [uninstalled] [2.5.6.SEC02    ] spring-web                    karaf-2.2.5                
          [uninstalled] [3.0.6.RELEASE  ] spring                        karaf-2.2.5                
          [uninstalled] [3.0.6.RELEASE  ] spring-aspects                karaf-2.2.5                
          [uninstalled] [1.2.1          ] spring-dm                     karaf-2.2.5                
          [uninstalled] [1.2.1          ] spring-dm-web                 karaf-2.2.5                
          [uninstalled] [3.0.6.RELEASE  ] spring-instrument             karaf-2.2.5                
          [uninstalled] [3.0.6.RELEASE  ] spring-jdbc                   karaf-2.2.5                
          [uninstalled] [3.0.6.RELEASE  ] spring-jms                    karaf-2.2.5                
          [uninstalled] [3.0.6.RELEASE  ] spring-struts                 karaf-2.2.5                
          [uninstalled] [3.0.6.RELEASE  ] spring-test                   karaf-2.2.5                
          [uninstalled] [3.0.6.RELEASE  ] spring-orm                    karaf-2.2.5                
          [uninstalled] [3.0.6.RELEASE  ] spring-oxm                    karaf-2.2.5                
          [uninstalled] [3.0.6.RELEASE  ] spring-tx                     karaf-2.2.5                
          [uninstalled] [3.0.6.RELEASE  ] spring-web                    karaf-2.2.5                
          [uninstalled] [3.0.6.RELEASE  ] spring-web-portlet            karaf-2.2.5                
          [uninstalled] [2.2.5          ] wrapper                       karaf-2.2.5                
          [uninstalled] [2.2.5          ] obr                           karaf-2.2.5                
          [installed  ] [2.2.5          ] config                        karaf-2.2.5                
          [uninstalled] [7.5.4.v20111024] jetty                         karaf-2.2.5                
          [uninstalled] [2.2.5          ] http                          karaf-2.2.5                
          [uninstalled] [2.2.5          ] war                           karaf-2.2.5                
          [installed  ] [2.2.5          ] kar                           karaf-2.2.5                
          [uninstalled] [2.2.5          ] webconsole-base               karaf-2.2.5                
          [uninstalled] [2.2.5          ] webconsole                    karaf-2.2.5                
          [installed  ] [2.2.5          ] ssh                           karaf-2.2.5                
          [installed  ] [2.2.5          ] management                    karaf-2.2.5                
          [uninstalled] [2.2.5          ] eventadmin                    karaf-2.2.5                
          [uninstalled] [2.2.5          ] jasypt-encryption             karaf-2.2.5                
          
          features:install whirr
          java.lang.Exception: Could not start bundle mvn:org.jclouds.driver/jclouds-sshj/1.3.1 in feature(s) jclouds-driver-sshj-1.3.1: Unresolved constraint in bundle jclouds-sshj [75]: Unable to resolve 75.0: missing requirement [75.0] package; (&(package=org.jclouds.compute.domain)(version>=1.3.1))
          	at org.apache.karaf.features.internal.FeaturesServiceImpl.installFeatures(FeaturesServiceImpl.java:356)
          	at org.apache.karaf.features.internal.FeaturesServiceImpl.installFeature(FeaturesServiceImpl.java:283)
          	at org.apache.karaf.features.internal.FeaturesServiceImpl.installFeature(FeaturesServiceImpl.java:279)
          	at org.apache.karaf.features.command.InstallFeatureCommand.doExecute(InstallFeatureCommand.java:62)
          	at org.apache.karaf.features.command.FeaturesCommandSupport.doExecute(FeaturesCommandSupport.java:39)
          	at org.apache.karaf.shell.console.OsgiCommandSupport.execute(OsgiCommandSupport.java:38)
          	at org.apache.felix.gogo.commands.basic.AbstractCommand.execute(AbstractCommand.java:35)
          	at org.apache.felix.gogo.runtime.CommandProxy.execute(CommandProxy.java:78)
          	at org.apache.felix.gogo.runtime.Closure.executeCmd(Closure.java:474)
          	at org.apache.felix.gogo.runtime.Closure.executeStatement(Closure.java:400)
          	at org.apache.felix.gogo.runtime.Pipe.run(Pipe.java:108)
          	at org.apache.felix.gogo.runtime.Closure.execute(Closure.java:183)
          	at org.apache.felix.gogo.runtime.Closure.execute(Closure.java:120)
          	at org.apache.felix.gogo.runtime.CommandSessionImpl.execute(CommandSessionImpl.java:89)
          	at org.apache.whirr.karaf.itest.WhirrKarafTestSupport$1.call(WhirrKarafTestSupport.java:146)
          	at org.apache.whirr.karaf.itest.WhirrKarafTestSupport$1.call(WhirrKarafTestSupport.java:140)
          	at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
          	at java.util.concurrent.FutureTask.run(FutureTask.java:138)
          	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
          	at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
          	at java.util.concurrent.FutureTask.run(FutureTask.java:138)
          	at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
          	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
          	at java.lang.Thread.run(Thread.java:680)
          Caused by: org.osgi.framework.BundleException: Unresolved constraint in bundle jclouds-sshj [75]: Unable to resolve 75.0: missing requirement [75.0] package; (&(package=org.jclouds.compute.domain)(version>=1.3.1))
          	at org.apache.felix.framework.Felix.resolveBundle(Felix.java:3446)
          	at org.apache.felix.framework.Felix.startBundle(Felix.java:1734)
          	at org.apache.felix.framework.BundleImpl.start(BundleImpl.java:918)
          	at org.apache.felix.framework.BundleImpl.start(BundleImpl.java:905)
          	at org.apache.karaf.features.internal.FeaturesServiceImpl.installFeatures(FeaturesServiceImpl.java:353)
          	... 23 more
          osgi:list
          START LEVEL 100 , List Threshold: 50
             ID   State         Blueprint      Level  Name
          [  49] [Active     ] [            ] [   60] OPS4J Pax Exam - Extender Service (2.3.0.M1)
          [  50] [Active     ] [            ] [   60] OPS4J Pax Exam - Remote Bundle Context (2.3.0.M1)
          [  51] [Active     ] [            ] [   60] wrap_mvn_junit_junit_4.10 (0)
          [  52] [Active     ] [            ] [   60] OPS4J Pax Exam - JUnit Probe Invoker (2.3.0.M1)
          [  53] [Active     ] [            ] [   60] OPS4J Pax Exam :: Karaf :: Test Options (0.4.0)
          [  54] [Active     ] [            ] [   60] Apache Geronimo JSR-330 Spec API (1.0)
          [  55] [Active     ] [            ] [   60] OPS4J Pax Exam - Injection (2.3.0.M1)
          [  56] [Active     ] [            ] [   60] PAXEXAM-PROBE-3dab3fa4-6009-46a8-b2b7-02fa81ef9fef
          
          Bundle: org.apache.felix.framework
          Bundle: org.ops4j.pax.url.wrap
          Bundle: org.ops4j.pax.url.mvn
          Bundle: org.ops4j.pax.logging.pax-logging-service
          Bundle: org.ops4j.pax.logging.pax-logging-api
          Bundle: org.apache.felix.configadmin
          Bundle: org.apache.felix.fileinstall
          Bundle: org.apache.aries.proxy
          Bundle: org.apache.aries.util
          Bundle: org.apache.aries.blueprint
          Bundle: org.apache.servicemix.bundles.asm
          Bundle: org.apache.karaf.deployer.blueprint
          Bundle: org.apache.karaf.diagnostic.management
          Bundle: org.apache.karaf.admin.management
          Bundle: org.apache.karaf.shell.console
          Bundle: org.apache.karaf.deployer.kar
          Bundle: org.apache.karaf.features.core
          Bundle: org.apache.karaf.diagnostic.command
          Bundle: sshd-core
          Bundle: org.apache.aries.jmx.blueprint
          Bundle: org.apache.karaf.management.server
          Bundle: org.apache.karaf.deployer.wrap
          Bundle: org.apache.karaf.shell.dev
          Bundle: org.apache.karaf.admin.command
          Bundle: org.apache.aries.jmx
          Bundle: org.apache.karaf.deployer.spring
          Bundle: org.apache.karaf.features.command
          Bundle: org.apache.karaf.shell.packages
          Bundle: org.apache.karaf.shell.osgi
          Bundle: org.apache.karaf.diagnostic.core
          Bundle: org.apache.mina.core
          Bundle: org.apache.karaf.jaas.config
          Bundle: org.apache.karaf.shell.ssh
          Bundle: org.apache.karaf.admin.core
          Bundle: org.apache.karaf.deployer.features
          Bundle: org.apache.karaf.jaas.command
          Bundle: org.apache.karaf.diagnostic.common
          Bundle: org.apache.karaf.shell.commands
          Bundle: org.apache.karaf.features.management
          Bundle: org.apache.karaf.shell.log
          Bundle: org.apache.karaf.jaas.modules
          Bundle: org.apache.karaf.shell.config
          Bundle: org.apache.karaf.management.mbeans.system
          Bundle: org.apache.karaf.management.mbeans.bundles
          Bundle: org.apache.karaf.management.mbeans.services
          Bundle: org.apache.karaf.management.mbeans.config
          Bundle: org.apache.karaf.management.mbeans.log
          Bundle: org.apache.karaf.management.mbeans.packages
          Bundle: org.apache.karaf.management.mbeans.dev
          Bundle: org.ops4j.pax.exam.extender.service
          Bundle: org.ops4j.pax.exam.rbc
          Bundle: wrap_mvn_junit_junit_4.10
          Bundle: org.ops4j.pax.exam.invoker.junit
          Bundle: org.openengsb.labs.paxexam.karaf.paxexam-karaf-options
          Bundle: org.apache.geronimo.specs.geronimo-atinject_1.0_spec
          Bundle: org.ops4j.pax.exam.inject
          Bundle: PAXEXAM-PROBE-3dab3fa4-6009-46a8-b2b7-02fa81ef9fef
          13010 [main] ERROR org.ops4j.pax.exam.junit.JUnit4TestRunner - Exception
          org.ops4j.pax.exam.TestContainerException: [testInstallation(org.apache.whirr.karaf.itest.WhirrInstallationTest): Bundle org.apache.whirr.karaf.commands does not exist]
          	at org.ops4j.pax.exam.invoker.junit.internal.JUnitProbeInvoker.invokeViaJUnit(JUnitProbeInvoker.java:112)
          	at org.ops4j.pax.exam.invoker.junit.internal.JUnitProbeInvoker.findAndInvoke(JUnitProbeInvoker.java:89)
          	at org.ops4j.pax.exam.invoker.junit.internal.JUnitProbeInvoker.call(JUnitProbeInvoker.java:72)
          	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
          	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
          	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
          	at java.lang.reflect.Method.invoke(Method.java:597)
          	at org.ops4j.pax.exam.rbc.internal.RemoteBundleContextImpl.remoteCall(RemoteBundleContextImpl.java:86)
          	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
          	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
          	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
          	at java.lang.reflect.Method.invoke(Method.java:597)
          	at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:303)
          	at sun.rmi.transport.Transport$1.run(Transport.java:159)
          	at java.security.AccessController.doPrivileged(Native Method)
          	at sun.rmi.transport.Transport.serviceCall(Transport.java:155)
          	at sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:535)
          	at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:790)
          	at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:649)
          	at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
          	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
          	at java.lang.Thread.run(Thread.java:680)
          Caused by: java.lang.RuntimeException: Bundle org.apache.whirr.karaf.commands does not exist
          	at org.apache.whirr.karaf.itest.WhirrKarafTestSupport.getInstalledBundle(WhirrKarafTestSupport.java:214)
          	at org.apache.whirr.karaf.itest.WhirrInstallationTest.testInstallation(WhirrInstallationTest.java:46)
          	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
          	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
          	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
          	at java.lang.reflect.Method.invoke(Method.java:597)
          	at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:45)
          	at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
          	at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:42)
          	at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
          	at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:263)
          	at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:68)
          	at org.ops4j.pax.exam.invoker.junit.internal.ContainerTestRunner.runChild(ContainerTestRunner.java:58)
          	at org.ops4j.pax.exam.invoker.junit.internal.ContainerTestRunner.runChild(ContainerTestRunner.java:32)
          	at org.junit.runners.ParentRunner$3.run(ParentRunner.java:231)
          	at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:60)
          	at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:229)
          	at org.junit.runners.ParentRunner.access$000(ParentRunner.java:50)
          	at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:222)
          	at org.junit.runners.ParentRunner.run(ParentRunner.java:300)
          	at org.junit.runner.JUnitCore.run(JUnitCore.java:157)
          	at org.junit.runner.JUnitCore.run(JUnitCore.java:136)
          	at org.ops4j.pax.exam.invoker.junit.internal.JUnitProbeInvoker.invokeViaJUnit(JUnitProbeInvoker.java:108)
          	... 21 more
          Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 13.149 sec <<< FAILURE!
          
          
          Show
          David Alves added a comment - Ok, the failure is transient in trunk, but not with the patch. Any help would be much appreciated. The first error that happens is: ------------------------------------------------------- T E S T S ------------------------------------------------------- Running org.apache.whirr.karaf.itest.WhirrInstallationTest SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/Users/dralves/.m2/repository/org/apache/karaf/org.apache.karaf.client/2.2.5/org.apache.karaf.client-2.2.5.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/Users/dralves/.m2/repository/org/slf4j/slf4j-simple/1.6.1/slf4j-simple-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/Users/dralves/.m2/repository/org/ops4j/pax/logging/pax-logging-api/1.6.3/pax-logging-api-1.6.3.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http: //www.slf4j.org/codes.html#multiple_bindings for an explanation. [org.ops4j.pax.url.mvn.internal.Connection] : Resolving [mvn:org.apache.karaf/apache-karaf/2.2.5/tar.gz] [org.ops4j.pax.url.mvn.internal.Connection] : Collecting versions from repository [file:/Users/dralves/.m2/repository/,releases= true ,snapshots= true ] [org.ops4j.pax.url.mvn.internal.Connection] : Resolving exact version [org.ops4j.pax.url.mvn.internal.Connection] : Collecting versions from repository [http: //osgi.sonatype.org/content/groups/pax-runner/,releases= true ,snapshots= false ] [org.ops4j.pax.url.mvn.internal.Connection] : Resolving exact version [org.ops4j.pax.url.mvn.internal.Connection] : Collecting versions from repository [http: //repo1.maven.org/maven2/,releases= true ,snapshots= false ] [org.ops4j.pax.url.mvn.internal.Connection] : Resolving exact version [org.ops4j.pax.url.mvn.internal.Connection] : Collecting versions from repository [http: //repository.ops4j.org/maven2/,releases= true ,snapshots= false ] [org.ops4j.pax.url.mvn.internal.Connection] : Resolving exact version [org.ops4j.pax.url.mvn.internal.Connection] : Collecting versions from repository [http: //repository.springsource.com/maven/bundles/release/,releases= true ,snapshots= false ] [org.ops4j.pax.url.mvn.internal.Connection] : Resolving exact version [org.ops4j.pax.url.mvn.internal.Connection] : Collecting versions from repository [http: //repository.springsource.com/maven/bundles/external/,releases= true ,snapshots= false ] [org.ops4j.pax.url.mvn.internal.Connection] : Resolving exact version __ __ ____ / //_/____ __________ _/ __/ / ,< / __ `/ ___/ __ `/ /_ / /| |/ /_/ / / / /_/ / __/ /_/ |_|\__,_/_/ \__,_/_/ Apache Karaf (2.2.5) Hit '<tab>' for a list of available commands and '[cmd] --help' for help on a specific command. Hit '<ctrl-d>' or 'osgi:shutdown' to shutdown Karaf. karaf@root> features:addurl mvn:org.apache.whirr.karaf/apache-whirr/0.8.0-SNAPSHOT/xml/features features:listurl Loaded URI true file:/Users/dralves/WorkApplications/OSS/whirr-git/platforms/karaf/itests/target/paxexam/1d1d5b2a-6ec7-4913-a35a-dc42072a41b3/examfeatures.xml true mvn:org.apache.whirr.karaf/apache-whirr/0.8.0-SNAPSHOT/xml/features true mvn:org.apache.karaf.assemblies.features/enterprise/2.2.5/xml/features true mvn:org.apache.karaf.assemblies.features/standard/2.2.5/xml/features features:list State Version Name Repository Description [installed ] [2.3.0.M1 ] exam pax-exam-features-2.3.0.M1 [uninstalled] [0.8.0-SNAPSHOT ] whirr repo-0 Apache Whirr Core [uninstalled] [0.8.0-SNAPSHOT ] whirr-cassandra repo-0 Apache Whirr Cassandra Service [uninstalled] [0.8.0-SNAPSHOT ] whirr-chef repo-0 Apache Whirr Chef Service [uninstalled] [0.8.0-SNAPSHOT ] whirr-elasticsearch repo-0 Apache Whirr ElasticSearch Service [uninstalled] [0.8.0-SNAPSHOT ] whirr-ganglia repo-0 Apache Whirr Ganglia Service [uninstalled] [0.8.0-SNAPSHOT ] whirr-hadoop repo-0 Apache Whirr Hadoop Service [uninstalled] [0.8.0-SNAPSHOT ] whirr-hbase repo-0 Apache Whirr Hbase Service [uninstalled] [0.8.0-SNAPSHOT ] whirr-hama repo-0 Apache Whirr Hama Service [uninstalled] [0.8.0-SNAPSHOT ] whirr-mahout repo-0 Apache Whirr Mahout Service [uninstalled] [0.8.0-SNAPSHOT ] whirr-puppet repo-0 Apache Whirr Puppet Service [uninstalled] [0.8.0-SNAPSHOT ] whirr-voldemort repo-0 Apache Whirr Voldermort Service [uninstalled] [0.8.0-SNAPSHOT ] whirr-zookeeper repo-0 Apache Whirr Zookeeper Service [uninstalled] [0.3 ] transaction karaf-enterprise-2.2.5 OSGi Transaction Manager [uninstalled] [0.3 ] jpa karaf-enterprise-2.2.5 OSGi Persistence Container [uninstalled] [0.3 ] jndi karaf-enterprise-2.2.5 OSGi Service Registry JNDI access [uninstalled] [0.3 ] application-without-isolation karaf-enterprise-2.2.5 [uninstalled] [2.2.5 ] karaf-framework karaf-2.2.5 [uninstalled] [2.5.6.SEC02 ] spring karaf-2.2.5 [uninstalled] [2.5.6.SEC02 ] spring-web karaf-2.2.5 [uninstalled] [3.0.6.RELEASE ] spring karaf-2.2.5 [uninstalled] [3.0.6.RELEASE ] spring-aspects karaf-2.2.5 [uninstalled] [1.2.1 ] spring-dm karaf-2.2.5 [uninstalled] [1.2.1 ] spring-dm-web karaf-2.2.5 [uninstalled] [3.0.6.RELEASE ] spring-instrument karaf-2.2.5 [uninstalled] [3.0.6.RELEASE ] spring-jdbc karaf-2.2.5 [uninstalled] [3.0.6.RELEASE ] spring-jms karaf-2.2.5 [uninstalled] [3.0.6.RELEASE ] spring-struts karaf-2.2.5 [uninstalled] [3.0.6.RELEASE ] spring-test karaf-2.2.5 [uninstalled] [3.0.6.RELEASE ] spring-orm karaf-2.2.5 [uninstalled] [3.0.6.RELEASE ] spring-oxm karaf-2.2.5 [uninstalled] [3.0.6.RELEASE ] spring-tx karaf-2.2.5 [uninstalled] [3.0.6.RELEASE ] spring-web karaf-2.2.5 [uninstalled] [3.0.6.RELEASE ] spring-web-portlet karaf-2.2.5 [uninstalled] [2.2.5 ] wrapper karaf-2.2.5 [uninstalled] [2.2.5 ] obr karaf-2.2.5 [installed ] [2.2.5 ] config karaf-2.2.5 [uninstalled] [7.5.4.v20111024] jetty karaf-2.2.5 [uninstalled] [2.2.5 ] http karaf-2.2.5 [uninstalled] [2.2.5 ] war karaf-2.2.5 [installed ] [2.2.5 ] kar karaf-2.2.5 [uninstalled] [2.2.5 ] webconsole-base karaf-2.2.5 [uninstalled] [2.2.5 ] webconsole karaf-2.2.5 [installed ] [2.2.5 ] ssh karaf-2.2.5 [installed ] [2.2.5 ] management karaf-2.2.5 [uninstalled] [2.2.5 ] eventadmin karaf-2.2.5 [uninstalled] [2.2.5 ] jasypt-encryption karaf-2.2.5 features:install whirr java.lang.Exception: Could not start bundle mvn:org.jclouds.driver/jclouds-sshj/1.3.1 in feature(s) jclouds-driver-sshj-1.3.1: Unresolved constraint in bundle jclouds-sshj [75]: Unable to resolve 75.0: missing requirement [75.0] package ; (&( package =org.jclouds.compute.domain)(version>=1.3.1)) at org.apache.karaf.features.internal.FeaturesServiceImpl.installFeatures(FeaturesServiceImpl.java:356) at org.apache.karaf.features.internal.FeaturesServiceImpl.installFeature(FeaturesServiceImpl.java:283) at org.apache.karaf.features.internal.FeaturesServiceImpl.installFeature(FeaturesServiceImpl.java:279) at org.apache.karaf.features.command.InstallFeatureCommand.doExecute(InstallFeatureCommand.java:62) at org.apache.karaf.features.command.FeaturesCommandSupport.doExecute(FeaturesCommandSupport.java:39) at org.apache.karaf.shell.console.OsgiCommandSupport.execute(OsgiCommandSupport.java:38) at org.apache.felix.gogo.commands.basic.AbstractCommand.execute(AbstractCommand.java:35) at org.apache.felix.gogo.runtime.CommandProxy.execute(CommandProxy.java:78) at org.apache.felix.gogo.runtime.Closure.executeCmd(Closure.java:474) at org.apache.felix.gogo.runtime.Closure.executeStatement(Closure.java:400) at org.apache.felix.gogo.runtime.Pipe.run(Pipe.java:108) at org.apache.felix.gogo.runtime.Closure.execute(Closure.java:183) at org.apache.felix.gogo.runtime.Closure.execute(Closure.java:120) at org.apache.felix.gogo.runtime.CommandSessionImpl.execute(CommandSessionImpl.java:89) at org.apache.whirr.karaf.itest.WhirrKarafTestSupport$1.call(WhirrKarafTestSupport.java:146) at org.apache.whirr.karaf.itest.WhirrKarafTestSupport$1.call(WhirrKarafTestSupport.java:140) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang. Thread .run( Thread .java:680) Caused by: org.osgi.framework.BundleException: Unresolved constraint in bundle jclouds-sshj [75]: Unable to resolve 75.0: missing requirement [75.0] package ; (&( package =org.jclouds.compute.domain)(version>=1.3.1)) at org.apache.felix.framework.Felix.resolveBundle(Felix.java:3446) at org.apache.felix.framework.Felix.startBundle(Felix.java:1734) at org.apache.felix.framework.BundleImpl.start(BundleImpl.java:918) at org.apache.felix.framework.BundleImpl.start(BundleImpl.java:905) at org.apache.karaf.features.internal.FeaturesServiceImpl.installFeatures(FeaturesServiceImpl.java:353) ... 23 more osgi:list START LEVEL 100 , List Threshold: 50 ID State Blueprint Level Name [ 49] [Active ] [ ] [ 60] OPS4J Pax Exam - Extender Service (2.3.0.M1) [ 50] [Active ] [ ] [ 60] OPS4J Pax Exam - Remote Bundle Context (2.3.0.M1) [ 51] [Active ] [ ] [ 60] wrap_mvn_junit_junit_4.10 (0) [ 52] [Active ] [ ] [ 60] OPS4J Pax Exam - JUnit Probe Invoker (2.3.0.M1) [ 53] [Active ] [ ] [ 60] OPS4J Pax Exam :: Karaf :: Test Options (0.4.0) [ 54] [Active ] [ ] [ 60] Apache Geronimo JSR-330 Spec API (1.0) [ 55] [Active ] [ ] [ 60] OPS4J Pax Exam - Injection (2.3.0.M1) [ 56] [Active ] [ ] [ 60] PAXEXAM-PROBE-3dab3fa4-6009-46a8-b2b7-02fa81ef9fef Bundle: org.apache.felix.framework Bundle: org.ops4j.pax.url.wrap Bundle: org.ops4j.pax.url.mvn Bundle: org.ops4j.pax.logging.pax-logging-service Bundle: org.ops4j.pax.logging.pax-logging-api Bundle: org.apache.felix.configadmin Bundle: org.apache.felix.fileinstall Bundle: org.apache.aries.proxy Bundle: org.apache.aries.util Bundle: org.apache.aries.blueprint Bundle: org.apache.servicemix.bundles.asm Bundle: org.apache.karaf.deployer.blueprint Bundle: org.apache.karaf.diagnostic.management Bundle: org.apache.karaf.admin.management Bundle: org.apache.karaf.shell.console Bundle: org.apache.karaf.deployer.kar Bundle: org.apache.karaf.features.core Bundle: org.apache.karaf.diagnostic.command Bundle: sshd-core Bundle: org.apache.aries.jmx.blueprint Bundle: org.apache.karaf.management.server Bundle: org.apache.karaf.deployer.wrap Bundle: org.apache.karaf.shell.dev Bundle: org.apache.karaf.admin.command Bundle: org.apache.aries.jmx Bundle: org.apache.karaf.deployer.spring Bundle: org.apache.karaf.features.command Bundle: org.apache.karaf.shell.packages Bundle: org.apache.karaf.shell.osgi Bundle: org.apache.karaf.diagnostic.core Bundle: org.apache.mina.core Bundle: org.apache.karaf.jaas.config Bundle: org.apache.karaf.shell.ssh Bundle: org.apache.karaf.admin.core Bundle: org.apache.karaf.deployer.features Bundle: org.apache.karaf.jaas.command Bundle: org.apache.karaf.diagnostic.common Bundle: org.apache.karaf.shell.commands Bundle: org.apache.karaf.features.management Bundle: org.apache.karaf.shell.log Bundle: org.apache.karaf.jaas.modules Bundle: org.apache.karaf.shell.config Bundle: org.apache.karaf.management.mbeans.system Bundle: org.apache.karaf.management.mbeans.bundles Bundle: org.apache.karaf.management.mbeans.services Bundle: org.apache.karaf.management.mbeans.config Bundle: org.apache.karaf.management.mbeans.log Bundle: org.apache.karaf.management.mbeans.packages Bundle: org.apache.karaf.management.mbeans.dev Bundle: org.ops4j.pax.exam.extender.service Bundle: org.ops4j.pax.exam.rbc Bundle: wrap_mvn_junit_junit_4.10 Bundle: org.ops4j.pax.exam.invoker.junit Bundle: org.openengsb.labs.paxexam.karaf.paxexam-karaf-options Bundle: org.apache.geronimo.specs.geronimo-atinject_1.0_spec Bundle: org.ops4j.pax.exam.inject Bundle: PAXEXAM-PROBE-3dab3fa4-6009-46a8-b2b7-02fa81ef9fef 13010 [main] ERROR org.ops4j.pax.exam.junit.JUnit4TestRunner - Exception org.ops4j.pax.exam.TestContainerException: [testInstallation(org.apache.whirr.karaf.itest.WhirrInstallationTest): Bundle org.apache.whirr.karaf.commands does not exist] at org.ops4j.pax.exam.invoker.junit.internal.JUnitProbeInvoker.invokeViaJUnit(JUnitProbeInvoker.java:112) at org.ops4j.pax.exam.invoker.junit.internal.JUnitProbeInvoker.findAndInvoke(JUnitProbeInvoker.java:89) at org.ops4j.pax.exam.invoker.junit.internal.JUnitProbeInvoker.call(JUnitProbeInvoker.java:72) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.ops4j.pax.exam.rbc.internal.RemoteBundleContextImpl.remoteCall(RemoteBundleContextImpl.java:86) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:303) at sun.rmi.transport.Transport$1.run(Transport.java:159) at java.security.AccessController.doPrivileged(Native Method) at sun.rmi.transport.Transport.serviceCall(Transport.java:155) at sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:535) at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:790) at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:649) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang. Thread .run( Thread .java:680) Caused by: java.lang.RuntimeException: Bundle org.apache.whirr.karaf.commands does not exist at org.apache.whirr.karaf.itest.WhirrKarafTestSupport.getInstalledBundle(WhirrKarafTestSupport.java:214) at org.apache.whirr.karaf.itest.WhirrInstallationTest.testInstallation(WhirrInstallationTest.java:46) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:45) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:42) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20) at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:263) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:68) at org.ops4j.pax.exam.invoker.junit.internal.ContainerTestRunner.runChild(ContainerTestRunner.java:58) at org.ops4j.pax.exam.invoker.junit.internal.ContainerTestRunner.runChild(ContainerTestRunner.java:32) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:231) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:60) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:229) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:50) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:222) at org.junit.runners.ParentRunner.run(ParentRunner.java:300) at org.junit.runner.JUnitCore.run(JUnitCore.java:157) at org.junit.runner.JUnitCore.run(JUnitCore.java:136) at org.ops4j.pax.exam.invoker.junit.internal.JUnitProbeInvoker.invokeViaJUnit(JUnitProbeInvoker.java:108) ... 21 more Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 13.149 sec <<< FAILURE!
          Hide
          David Alves added a comment -

          Seems like the build failure is transient. It sometimes fails in karaf's itests, both in trunk and whirr-221 branches and sometimes it doesn't. I'm not familiar enough with the pax framework to decode it at first sight.

          Show
          David Alves added a comment - Seems like the build failure is transient. It sometimes fails in karaf's itests, both in trunk and whirr-221 branches and sometimes it doesn't. I'm not familiar enough with the pax framework to decode it at first sight.
          Hide
          David Alves added a comment -

          Updated to latest trunk.

          Maven build fails in karaf.

          I'm not sure it is related to this patch (some jclouds bundle not being there). the full error follows attached.

          Show
          David Alves added a comment - Updated to latest trunk. Maven build fails in karaf. I'm not sure it is related to this patch (some jclouds bundle not being there). the full error follows attached.
          Hide
          David Alves added a comment - - edited

          Dependencies are specified as follows:
          public class HBaseMasterClusterActionHandler {

          ...
          public void getRequiredRoles()

          { return ImmutableSet.of("zookeeper"); }

          ...
          }

          Dependencies are overriden in configuration as follows:

          whirr.role-dependency.hbase-master=

          (to set the dependency to none)

          There is a main system to run ClusterActionEvents. Before all stages are executed ClusterActionEvents are built, separated into stages that can be executed in parallel, and a Callable is attached to each one. After all beforeAction() method have been called the stages are submitted and each callable is executed according to the stage.

          Even though the next stage won't begin before all the scripts of the current stage complete, for the cases where a startup script returns before the service is actually active, handlers can override and return a positive value in getOnlineDelayMillis(). In this case after each stage whirr will wait for the biggest of the delays before executing next stage's scripts.

          Maven build passes. Dependency analyzer is unit tested.

          Do you think we need more unit tests or can we proceed to itests?

          There are some corner cases that are still not implemented, namely cyclic dependencies, and multiple paths a<-b,a<-c,b<-c, but I can't think of a case where they are required and as such decided leave them for later.

          Please review/comment/test/suggest more tests.

          Show
          David Alves added a comment - - edited Dependencies are specified as follows: public class HBaseMasterClusterActionHandler { ... public void getRequiredRoles() { return ImmutableSet.of("zookeeper"); } ... } Dependencies are overriden in configuration as follows: whirr.role-dependency.hbase-master= (to set the dependency to none) There is a main system to run ClusterActionEvents. Before all stages are executed ClusterActionEvents are built, separated into stages that can be executed in parallel, and a Callable is attached to each one. After all beforeAction() method have been called the stages are submitted and each callable is executed according to the stage. Even though the next stage won't begin before all the scripts of the current stage complete, for the cases where a startup script returns before the service is actually active, handlers can override and return a positive value in getOnlineDelayMillis(). In this case after each stage whirr will wait for the biggest of the delays before executing next stage's scripts. Maven build passes. Dependency analyzer is unit tested. Do you think we need more unit tests or can we proceed to itests? There are some corner cases that are still not implemented, namely cyclic dependencies, and multiple paths a<-b,a<-c,b<-c, but I can't think of a case where they are required and as such decided leave them for later. Please review/comment/test/suggest more tests.
          Hide
          David Alves added a comment -

          unit tests pass (maven build fails on a encrypted pk problem, but so does trunk)
          comply with rat and checkstyle

          Still not feature complete, (missing waittime between phases, configuration overrides) but should be ready for next phase (itests)

          Show
          David Alves added a comment - unit tests pass (maven build fails on a encrypted pk problem, but so does trunk) comply with rat and checkstyle Still not feature complete, (missing waittime between phases, configuration overrides) but should be ready for next phase (itests)
          Hide
          David Alves added a comment -

          First iteration patch, still not tested (and not compiling because of BYON action):

          • added the concept of dependencies
          • added a dependency analyzer that creates stages: each stage is a set of roles to be executed across instancetemplates before the next stage can begin
          • Moved non script specific stuff in ScriptBasedClusterAction to ClusterAction including a generic stage execution system (each action sets a callable in the ClusterActionEvent and this callable is executed by ClusterAction in a per stage basis).
          • All actions that inherited from ScriptBasedCluster action are now stage oriented, the only one that is not is BoostrapClusterAction.
          • Took the chance to remove the deprecated call of ScripBaseClusterAction

          To Come:

          • DryRunTests (ongoing)
          • Perform the Itests
          • Finish BYON (which should be stage aware)
          Show
          David Alves added a comment - First iteration patch, still not tested (and not compiling because of BYON action): added the concept of dependencies added a dependency analyzer that creates stages: each stage is a set of roles to be executed across instancetemplates before the next stage can begin Moved non script specific stuff in ScriptBasedClusterAction to ClusterAction including a generic stage execution system (each action sets a callable in the ClusterActionEvent and this callable is executed by ClusterAction in a per stage basis). All actions that inherited from ScriptBasedCluster action are now stage oriented, the only one that is not is BoostrapClusterAction. Took the chance to remove the deprecated call of ScripBaseClusterAction To Come: DryRunTests (ongoing) Perform the Itests Finish BYON (which should be stage aware)
          Hide
          David Alves added a comment -

          First Iteration, BYON still not compiling and not tested.

          Show
          David Alves added a comment - First Iteration, BYON still not compiling and not tested.
          Hide
          Andrei Savu added a comment -

          Still, do you think we should ignore missing dependencies?

          You are right! As long we allow the user to override the dependencies graph we should fail fast if any of the requested dependencies is missing with a helpful error message.

          PS: nice work! I'm happy that we should be able to add this feature soon. It should make the deployment more robust.

          Show
          Andrei Savu added a comment - Still, do you think we should ignore missing dependencies? You are right! As long we allow the user to override the dependencies graph we should fail fast if any of the requested dependencies is missing with a helpful error message. PS: nice work! I'm happy that we should be able to add this feature soon. It should make the deployment more robust.
          Hide
          David Alves added a comment - - edited

          wrt to the waitTime agreed.

          wrt to implicit dependencies I wasn't thinking of including in this jira.

          Still, do you think we should ignore missing dependencies? I get that we should be able to override dependencies for cases like embedded zookeeper in Hbase but for most cases we should at least warn the user and give the option to abort. If someone created a 1 nn, 100 jt+dn cluster (forgot the tasktracker) they are about to create a pretty useless and expensive cluster.

          Overriding the implicit graph sounds good. I think we should also give the option of overriding on a per service basis (e.g. for the case ok embedded zk)

          Show
          David Alves added a comment - - edited wrt to the waitTime agreed. wrt to implicit dependencies I wasn't thinking of including in this jira. Still, do you think we should ignore missing dependencies? I get that we should be able to override dependencies for cases like embedded zookeeper in Hbase but for most cases we should at least warn the user and give the option to abort. If someone created a 1 nn, 100 jt+dn cluster (forgot the tasktracker) they are about to create a pretty useless and expensive cluster. Overriding the implicit graph sounds good. I think we should also give the option of overriding on a per service basis (e.g. for the case ok embedded zk)
          Hide
          Andrei Savu added a comment -

          ... and the new ScriptBasedClusterAction looks good to me.

          Show
          Andrei Savu added a comment - ... and the new ScriptBasedClusterAction looks good to me.
          Hide
          Andrei Savu added a comment -

          I get that we need timeouts has some services's scripts will return success before the service is actually running. How do you suggest we implement this? my idea would be that each handler could return (optionally) a waitTime for the service to be available, at each state, after all scripts were executed whirr would wait for the longest of these times. I mean if services have no need for a wait time then why wait.

          Sounds reasonable as long as we make this optional and we have a global default value. If it's easier for you the global value should be good enough for now.

          Something else came to mind while implementing dependencies: default service configuration and implicit roles. Let me clarify: if we start a hbase cluster we need a zookeeper cluster, it would be nice if whirr could figure that out and install zookeeper without the user explicitely telling it to. In this case the zookeeper handler could return a typical layout for zookeeper depending on cluster size (<3 nodes clusters 1 zk node, 3<=cluster<5 3 zk nodes cluster>=5 5 zk nodes. And this is a complex case, more often that not services can simply be installed in all nodes.

          I don't think we should go that far for the first implementation. Let's try to keep things simple & explicit (at least for now) so that we can focus on releasing this as part of 0.7.1. BTW HBase can also use an embedded ZooKeeper server so I think if the dependent service is missing it should be ignored by the dependency analyser. What do you think?

          And finally I think we should also have the following option as a way to override the implicit graph:

          whirr.role-order=role1+role2,role3,role4,role5

          Let me know what do you think? BTW I can help testing this (manual or by writing tests).

          Show
          Andrei Savu added a comment - I get that we need timeouts has some services's scripts will return success before the service is actually running. How do you suggest we implement this? my idea would be that each handler could return (optionally) a waitTime for the service to be available, at each state, after all scripts were executed whirr would wait for the longest of these times. I mean if services have no need for a wait time then why wait. Sounds reasonable as long as we make this optional and we have a global default value. If it's easier for you the global value should be good enough for now. Something else came to mind while implementing dependencies: default service configuration and implicit roles. Let me clarify: if we start a hbase cluster we need a zookeeper cluster, it would be nice if whirr could figure that out and install zookeeper without the user explicitely telling it to. In this case the zookeeper handler could return a typical layout for zookeeper depending on cluster size (<3 nodes clusters 1 zk node, 3<=cluster<5 3 zk nodes cluster>=5 5 zk nodes. And this is a complex case, more often that not services can simply be installed in all nodes. I don't think we should go that far for the first implementation. Let's try to keep things simple & explicit (at least for now) so that we can focus on releasing this as part of 0.7.1. BTW HBase can also use an embedded ZooKeeper server so I think if the dependent service is missing it should be ignored by the dependency analyser. What do you think? And finally I think we should also have the following option as a way to override the implicit graph: whirr.role-order=role1+role2,role3,role4,role5 Let me know what do you think? BTW I can help testing this (manual or by writing tests).
          Hide
          David Alves added a comment -

          turns our quite a lot has to change for this to happen.

          for one clusteractionevents no longer have a one-to-one relationship to instancetemplates. instead each template will have multiple ones, each one for a single stage this can be seen in the execute() method. also the doAction() method had to change signature.

          Andrei:
          I get that we need timeouts has some services's scripts will return success before the service is actually running. How do you suggest we implement this? my idea would be that each handler could return (optionally) a waitTime for the service to be available, at each state, after all scripts were executed whirr would wait for the longest of these times.
          I mean if services have no need for a wait time then why wait.

          Something else came to mind while implementing dependencies: default service configuration and implicit roles. Let me clarify: if we start a hbase cluster we need a zookeeper cluster, it would be nice if whirr could figure that out and install zookeeper without the user explicitely telling it to. In this case the zookeeper handler could return a typical layout for zookeeper depending on cluster size (<3 nodes clusters 1 zk node, 3<=cluster<5 3 zk nodes cluster>=5 5 zk nodes. And this is a complex case, more often that not services can simply be installed in all nodes.

          have a look at:
          https://github.com/dralves/whirr/blob/7227fb67a7b8623000df59fae3026713eb0f868f/core/src/main/java/org/apache/whirr/actions/ScriptBasedClusterAction.java

          Show
          David Alves added a comment - turns our quite a lot has to change for this to happen. for one clusteractionevents no longer have a one-to-one relationship to instancetemplates. instead each template will have multiple ones, each one for a single stage this can be seen in the execute() method. also the doAction() method had to change signature. Andrei: I get that we need timeouts has some services's scripts will return success before the service is actually running. How do you suggest we implement this? my idea would be that each handler could return (optionally) a waitTime for the service to be available, at each state, after all scripts were executed whirr would wait for the longest of these times. I mean if services have no need for a wait time then why wait. Something else came to mind while implementing dependencies: default service configuration and implicit roles. Let me clarify: if we start a hbase cluster we need a zookeeper cluster, it would be nice if whirr could figure that out and install zookeeper without the user explicitely telling it to. In this case the zookeeper handler could return a typical layout for zookeeper depending on cluster size (<3 nodes clusters 1 zk node, 3<=cluster<5 3 zk nodes cluster>=5 5 zk nodes. And this is a complex case, more often that not services can simply be installed in all nodes. have a look at: https://github.com/dralves/whirr/blob/7227fb67a7b8623000df59fae3026713eb0f868f/core/src/main/java/org/apache/whirr/actions/ScriptBasedClusterAction.java
          Hide
          Tom White added a comment -
          Show
          Tom White added a comment - BSD is fine: http://www.apache.org/legal/resolved.html#category-a
          Hide
          Andrei Savu added a comment -

          Looks great! I'm really happy that you are working on this one.

          A few comments:

          • on line 51 in ClusterActionHandlerSupport.java I think we want ImmutableSet.of()
          • we need a way to specify a sleep interval between execution stages (we are going to replace this later with a service status check)

          I think we BSD is compatible with ASL. Tom?
          http://programmers.stackexchange.com/questions/40561/is-bsd-license-compatible-with-apache

          Show
          Andrei Savu added a comment - Looks great! I'm really happy that you are working on this one. A few comments: on line 51 in ClusterActionHandlerSupport.java I think we want ImmutableSet.of() we need a way to specify a sleep interval between execution stages (we are going to replace this later with a service status check) I think we BSD is compatible with ASL. Tom? http://programmers.stackexchange.com/questions/40561/is-bsd-license-compatible-with-apache
          Hide
          David Alves added a comment -

          btw I'm using Jung (a graph library) that is BSD licensed. Is this a problem? the graph stuff is so simple I could easily reimplement if needed.

          Show
          David Alves added a comment - btw I'm using Jung (a graph library) that is BSD licensed. Is this a problem? the graph stuff is so simple I could easily reimplement if needed.
          Hide
          David Alves added a comment - - edited

          Hi

          I'm working on this, I started from scratch a while ago and I just now got the time to start to put things together.

          Andrei: I would prefer not to use annotations since dependencies might depend on configuration.

          So the idea is:

          add a getDependedOnRole() method to ClusterActionController. This method returns the set of roles this role needs before it can start. This might be fixes (e.g. hard-coded into the ClusterActionController class) or it might be configurable through the properties file.

          In each phase (configure, bootstrap start etc) the a Action calls dependency analyzer that returns a set of stages. Each stage has a set of roles that can be started in parallel and must finish before the next stage.

          The dependency graph is build based on calls to getDependedOnRole(). The roots are services that depend on no one, the next stage is composed of the root's children, etc.

          Still too early for a patch as I'm still finishing things up, but if you guys could take a look at the dependency analyzer, your feedback would b much appreciated. here are the github links:

          https://github.com/dralves/whirr/blob/whirr-221/core/src/main/java/org/apache/whirr/service/DependencyAnalyzer.java
          https://github.com/dralves/whirr/blob/whirr-221/core/src/main/java/org/apache/whirr/service/ClusterActionHandlerSupport.java
          https://github.com/dralves/whirr/blob/whirr-221/core/src/test/java/org/apache/whirr/service/DependencyAnalyzerTest.java

          To come: the changes to ScriptBasedClusterAction (ongoing) and the dryrun test.

          Show
          David Alves added a comment - - edited Hi I'm working on this, I started from scratch a while ago and I just now got the time to start to put things together. Andrei: I would prefer not to use annotations since dependencies might depend on configuration. So the idea is: add a getDependedOnRole() method to ClusterActionController. This method returns the set of roles this role needs before it can start. This might be fixes (e.g. hard-coded into the ClusterActionController class) or it might be configurable through the properties file. In each phase (configure, bootstrap start etc) the a Action calls dependency analyzer that returns a set of stages. Each stage has a set of roles that can be started in parallel and must finish before the next stage. The dependency graph is build based on calls to getDependedOnRole(). The roots are services that depend on no one, the next stage is composed of the root's children, etc. Still too early for a patch as I'm still finishing things up, but if you guys could take a look at the dependency analyzer, your feedback would b much appreciated. here are the github links: https://github.com/dralves/whirr/blob/whirr-221/core/src/main/java/org/apache/whirr/service/DependencyAnalyzer.java https://github.com/dralves/whirr/blob/whirr-221/core/src/main/java/org/apache/whirr/service/ClusterActionHandlerSupport.java https://github.com/dralves/whirr/blob/whirr-221/core/src/test/java/org/apache/whirr/service/DependencyAnalyzerTest.java To come: the changes to ScriptBasedClusterAction (ongoing) and the dryrun test.
          Hide
          Andrei Savu added a comment -

          This feature is even more important now that we are running configuration & start scripts in parallel on all instances. I'm seeing transient integration tests failure quite often.

          Show
          Andrei Savu added a comment - This feature is even more important now that we are running configuration & start scripts in parallel on all instances. I'm seeing transient integration tests failure quite often.
          Hide
          Andrei Savu added a comment -

          Adding this on the roadmap for 0.8.0.

          Show
          Andrei Savu added a comment - Adding this on the roadmap for 0.8.0.
          Hide
          Andrei Savu added a comment -

          How about having an annotation on the role action handler methods?

          
            @RunBefore(roles = {"hbase-master", "hadoop-namenode", "my-custom-app"})
            @Override
            protected void beforeConfigure(ClusterActionEvent event) throws IOException {
              ....
            }
          
          

          and some extra methods like before/afterStart, before/afterStop, before/afterCleanup? (I will open another JIRA)

          Show
          Andrei Savu added a comment - How about having an annotation on the role action handler methods? @RunBefore(roles = { "hbase-master" , "hadoop-namenode" , "my-custom-app" }) @Override protected void beforeConfigure(ClusterActionEvent event) throws IOException { .... } and some extra methods like before/afterStart, before/afterStop, before/afterCleanup? (I will open another JIRA)
          Hide
          Andrei Savu added a comment -

          supposing all services are able to wait until the services on which they depend are available, is there still a need for ordered service startup?

          That should be a good enough strategy for solving this kind of problems, at least for now.

          Show
          Andrei Savu added a comment - supposing all services are able to wait until the services on which they depend are available, is there still a need for ordered service startup? That should be a good enough strategy for solving this kind of problems, at least for now.
          Hide
          Bruno Dumon added a comment -

          I thought about implementing this in the context of WHIRR-334, but finally was able to use another solution there (polling for HDFS availability before starting HBase).

          I have been thinking a bit about the following: supposing all services are able to wait until the services on which they depend are available, is there still a need for ordered service startup?

          I think there might, for example: the HBase master waits at startup for region servers to appear, until their number remained stable in the (by default) last 4.5s (see ServerManager.waitForRegionServers) (you can also configure HBase to wait for a minimum number of region servers to be available at startup). I'm not sure about the details, but I assume that after this time it will start reassigning the regions of unavailable region servers. On a fresh cluster, there won't be any regions yet, so this would seem like unimportant. However, I'm planning to install my own service using Whirr which at startup creates tables in HBase with initial region splits. If not all region servers are available yet, HBase will afterwards have to rebalance those regions (and possibly the same for the corresponding hdfs files).

          I've seen some of the configure scripts do a sleep after starting a service (e.g. the hadoop datanode), which is also a way of stating that order is important. If startup order would be controlled by Whirr, these kinds of sleeps (or more intelligent checks of service availability such as a port check) between startup of dependent roles could be handled at that level, which is easier to maintain.

          To make things robust for larger clusters, we should not impose an absolute startup order but rather wait for a minimum % of started processes for some role.

          Regardless of startup order, it would be a good thing if Whirr had knowledge of the dependencies between service roles and also of roles that should have only one instance. This would allow to validate the cluster template early on.

          Show
          Bruno Dumon added a comment - I thought about implementing this in the context of WHIRR-334 , but finally was able to use another solution there (polling for HDFS availability before starting HBase). I have been thinking a bit about the following: supposing all services are able to wait until the services on which they depend are available, is there still a need for ordered service startup? I think there might, for example: the HBase master waits at startup for region servers to appear, until their number remained stable in the (by default) last 4.5s (see ServerManager.waitForRegionServers) (you can also configure HBase to wait for a minimum number of region servers to be available at startup). I'm not sure about the details, but I assume that after this time it will start reassigning the regions of unavailable region servers. On a fresh cluster, there won't be any regions yet, so this would seem like unimportant. However, I'm planning to install my own service using Whirr which at startup creates tables in HBase with initial region splits. If not all region servers are available yet, HBase will afterwards have to rebalance those regions (and possibly the same for the corresponding hdfs files). I've seen some of the configure scripts do a sleep after starting a service (e.g. the hadoop datanode), which is also a way of stating that order is important. If startup order would be controlled by Whirr, these kinds of sleeps (or more intelligent checks of service availability such as a port check) between startup of dependent roles could be handled at that level, which is easier to maintain. To make things robust for larger clusters, we should not impose an absolute startup order but rather wait for a minimum % of started processes for some role. Regardless of startup order, it would be a good thing if Whirr had knowledge of the dependencies between service roles and also of roles that should have only one instance. This would allow to validate the cluster template early on.
          Hide
          David Alves added a comment -

          Altered the patch to match the current trunk state and removed changes that regarded testing only (now at WHIRR-243).

          Show
          David Alves added a comment - Altered the patch to match the current trunk state and removed changes that regarded testing only (now at WHIRR-243 ).
          Hide
          Tom White added a comment -

          Thanks for the contribution David. This is great work, but I'm struggling to see the use case that motivates the feature. It came up in the context of HBase, but HBase works without it (as far as I know). I'm hesitant to add this code without having at least one service that uses it. Lars/Andrei, perhaps you can shed some more light? Thanks.

          Show
          Tom White added a comment - Thanks for the contribution David. This is great work, but I'm struggling to see the use case that motivates the feature. It came up in the context of HBase, but HBase works without it (as far as I know). I'm hesitant to add this code without having at least one service that uses it. Lars/Andrei, perhaps you can shed some more light? Thanks.
          Hide
          David Alves added a comment -

          Forgot to add the ASF headers.

          Show
          David Alves added a comment - Forgot to add the ASF headers.
          Hide
          David Alves added a comment - - edited

          Added the ability to specify whirr.role.order exactly as defined in the description of the issue. Unit tests are passing fine. Please Review!

          When a user specifies:
          whirr.role-order=role1+role2,role3,role4,role5
          each roles group will start in order i.e.:
          role1+role2
          role3
          role4
          role5

          When nothing is specified the all roles are started at the same time (per InstanceTemplate groups).

          Limitations:
          I've seen some inconsistencies when whirr.instance-templates has mixed roles, i.e.:
          1 role1+role2,2 role3,3 role4+role5
          works fine
          1 role1+role2,2 role3+role3,3 role1+role2+role4+role5
          does not. Must look into it further...

          • Have not made integration tests as of yet (requires further investigation to make the correct assertions)

          Remarks:

          • Added a completely mocked/stubbed ComputeServiceContext to allow to run unit tests cluster startup.
          • Changed the ClusterSpec to account for the new property (and to be able to parse it)
          • Added the possibility for ComputeServiceContextBuilder to return a "test" ComputeServiceContext (the stub)
          • Added test whirr*.properties to ease testing in the ordered and not ordered build cases.
          • Changed ConfigureClusterAction so that it now runs in parallel both in the unordered and ordered configure modes.
          Show
          David Alves added a comment - - edited Added the ability to specify whirr.role.order exactly as defined in the description of the issue. Unit tests are passing fine. Please Review! When a user specifies: whirr.role-order=role1+role2,role3,role4,role5 each roles group will start in order i.e.: role1+role2 role3 role4 role5 When nothing is specified the all roles are started at the same time (per InstanceTemplate groups). Limitations: I've seen some inconsistencies when whirr.instance-templates has mixed roles, i.e.: 1 role1+role2,2 role3,3 role4+role5 works fine 1 role1+role2,2 role3+role3,3 role1+role2+role4+role5 does not. Must look into it further... Have not made integration tests as of yet (requires further investigation to make the correct assertions) Remarks: Added a completely mocked/stubbed ComputeServiceContext to allow to run unit tests cluster startup. Changed the ClusterSpec to account for the new property (and to be able to parse it) Added the possibility for ComputeServiceContextBuilder to return a "test" ComputeServiceContext (the stub) Added test whirr*.properties to ease testing in the ordered and not ordered build cases. Changed ConfigureClusterAction so that it now runs in parallel both in the unordered and ordered configure modes.
          Hide
          Tibor Kiss added a comment - - edited

          What exactly means that? We have a BootstrapClusterAction and a ConfigureClusterAction. It is enough to serialize according the the specified order only inside BootstrapClusterAction or BootstrapClusterAction and ConfigureClusterAction has to be serialized in one, role by role?
          There are two places where the change has to take place. The BootstrapCluterAction#doAction and eventually the parent ScriptBasedClusterAction#execute. The latter one raises a new question. It is important to serialize in the same way (ordered) the handler.beforeAction and handler.afterAction event handlers? The latter question is harder to answer, at least for me and now.

          If it is enough to serialize according to order only at the level of BootstrapClusterAction#doAction, then I would say that when I was fixing WHIRR-167 I was taking in consideration such a change and I tried to reduce the complexity of BootstrapClusterAction#doAction. The WHIRR-167 is near to be commited in trunk (read the latest comments there), this change is much easier to schedule after WHIRR-167 has been merged into trunk.

          Anyway, lets answer the questions raised by the desired feature.

          Show
          Tibor Kiss added a comment - - edited What exactly means that? We have a BootstrapClusterAction and a ConfigureClusterAction. It is enough to serialize according the the specified order only inside BootstrapClusterAction or BootstrapClusterAction and ConfigureClusterAction has to be serialized in one, role by role? There are two places where the change has to take place. The BootstrapCluterAction#doAction and eventually the parent ScriptBasedClusterAction#execute. The latter one raises a new question. It is important to serialize in the same way (ordered) the handler.beforeAction and handler.afterAction event handlers? The latter question is harder to answer, at least for me and now. If it is enough to serialize according to order only at the level of BootstrapClusterAction#doAction, then I would say that when I was fixing WHIRR-167 I was taking in consideration such a change and I tried to reduce the complexity of BootstrapClusterAction#doAction. The WHIRR-167 is near to be commited in trunk (read the latest comments there), this change is much easier to schedule after WHIRR-167 has been merged into trunk. Anyway, lets answer the questions raised by the desired feature.

            People

            • Assignee:
              David Alves
              Reporter:
              Andrei Savu
            • Votes:
              4 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

              • Created:
                Updated:

                Development