Whirr
  1. Whirr
  2. WHIRR-52

Allow a Hadoop MapReduce job to be run against a Hadoop Service running on Rackspace Cloud Servers

    Details

    • Type: Improvement Improvement
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.2.0
    • Component/s: service/hadoop
    • Labels:
      None
    1. WHIRR-52.patch
      6 kB
      Tom White
    2. WHIRR-52.patch
      11 kB
      Tom White
    3. WHIRR-52.patch
      16 kB
      Tom White

      Issue Links

        Activity

        Hide
        Jeff Hammerbacher added a comment -

        When running http://github.com/hammer/whirr-demo with a whirr.provider set to cloudservers, I get:

        Starting the cluster.
        Launching cluster.
        Cluster launched.
        Configuring Proxy.
        Starting Proxy.
        Proxy Started.
        Cluster started.
        Running MapReduce job.
        Warning: Permanently added '184-106-200-188.static.cloud-ips.com,184.106.200.188' (RSA) to the list of known hosts.
        10/06/28 20:15:37 INFO ipc.Client: Retrying connect to server: 184-106-200-188.static.cloud-ips.com/184.106.200.188:8021. Already tried 0 time(s).
        10/06/28 20:15:38 INFO ipc.Client: Retrying connect to server: 184-106-200-188.static.cloud-ips.com/184.106.200.188:8021. Already tried 1 time(s).
        10/06/28 20:15:39 INFO ipc.Client: Retrying connect to server: 184-106-200-188.static.cloud-ips.com/184.106.200.188:8021. Already tried 2 time(s).
        10/06/28 20:15:40 INFO ipc.Client: Retrying connect to server: 184-106-200-188.static.cloud-ips.com/184.106.200.188:8021. Already tried 3 time(s).
        10/06/28 20:15:41 INFO ipc.Client: Retrying connect to server: 184-106-200-188.static.cloud-ips.com/184.106.200.188:8021. Already tried 4 time(s).
        10/06/28 20:15:42 INFO ipc.Client: Retrying connect to server: 184-106-200-188.static.cloud-ips.com/184.106.200.188:8021. Already tried 5 time(s).
        10/06/28 20:15:43 INFO ipc.Client: Retrying connect to server: 184-106-200-188.static.cloud-ips.com/184.106.200.188:8021. Already tried 6 time(s).
        10/06/28 20:15:44 INFO ipc.Client: Retrying connect to server: 184-106-200-188.static.cloud-ips.com/184.106.200.188:8021. Already tried 7 time(s).
        10/06/28 20:15:45 INFO ipc.Client: Retrying connect to server: 184-106-200-188.static.cloud-ips.com/184.106.200.188:8021. Already tried 8 time(s).
        10/06/28 20:15:46 INFO ipc.Client: Retrying connect to server: 184-106-200-188.static.cloud-ips.com/184.106.200.188:8021. Already tried 9 time(s).
        Could not run job: Call to 184-106-200-188.static.cloud-ips.com/184.106.200.188:8021 failed on local exception: java.net.SocketException: Connection refused
        Finished MapReduce job.
        Bringing down the cluster.
        Cluster stopped.
        
        Show
        Jeff Hammerbacher added a comment - When running http://github.com/hammer/whirr-demo with a whirr.provider set to cloudservers , I get: Starting the cluster. Launching cluster. Cluster launched. Configuring Proxy. Starting Proxy. Proxy Started. Cluster started. Running MapReduce job. Warning: Permanently added '184-106-200-188.static.cloud-ips.com,184.106.200.188' (RSA) to the list of known hosts. 10/06/28 20:15:37 INFO ipc.Client: Retrying connect to server: 184-106-200-188.static.cloud-ips.com/184.106.200.188:8021. Already tried 0 time(s). 10/06/28 20:15:38 INFO ipc.Client: Retrying connect to server: 184-106-200-188.static.cloud-ips.com/184.106.200.188:8021. Already tried 1 time(s). 10/06/28 20:15:39 INFO ipc.Client: Retrying connect to server: 184-106-200-188.static.cloud-ips.com/184.106.200.188:8021. Already tried 2 time(s). 10/06/28 20:15:40 INFO ipc.Client: Retrying connect to server: 184-106-200-188.static.cloud-ips.com/184.106.200.188:8021. Already tried 3 time(s). 10/06/28 20:15:41 INFO ipc.Client: Retrying connect to server: 184-106-200-188.static.cloud-ips.com/184.106.200.188:8021. Already tried 4 time(s). 10/06/28 20:15:42 INFO ipc.Client: Retrying connect to server: 184-106-200-188.static.cloud-ips.com/184.106.200.188:8021. Already tried 5 time(s). 10/06/28 20:15:43 INFO ipc.Client: Retrying connect to server: 184-106-200-188.static.cloud-ips.com/184.106.200.188:8021. Already tried 6 time(s). 10/06/28 20:15:44 INFO ipc.Client: Retrying connect to server: 184-106-200-188.static.cloud-ips.com/184.106.200.188:8021. Already tried 7 time(s). 10/06/28 20:15:45 INFO ipc.Client: Retrying connect to server: 184-106-200-188.static.cloud-ips.com/184.106.200.188:8021. Already tried 8 time(s). 10/06/28 20:15:46 INFO ipc.Client: Retrying connect to server: 184-106-200-188.static.cloud-ips.com/184.106.200.188:8021. Already tried 9 time(s). Could not run job: Call to 184-106-200-188.static.cloud-ips.com/184.106.200.188:8021 failed on local exception: java.net.SocketException: Connection refused Finished MapReduce job. Bringing down the cluster. Cluster stopped.
        Hide
        Jeff Hammerbacher added a comment -

        From the above error message, I think something funky is happening when we grab the public name of the JT (184-106-200-188.static.cloud-ips.com/184.106.200.188:8021 sure doesn't look right). Will debug a bit further, but if someone knows how the jclouds Rackspace Cloud Servers API works, that knowledge would be useful here.

        Show
        Jeff Hammerbacher added a comment - From the above error message, I think something funky is happening when we grab the public name of the JT ( 184-106-200-188.static.cloud-ips.com/184.106.200.188:8021 sure doesn't look right). Will debug a bit further, but if someone knows how the jclouds Rackspace Cloud Servers API works, that knowledge would be useful here.
        Hide
        Jeff Hammerbacher added a comment -

        Wow, seeing some potentially weird behavior from Rackspace.

        In HadoopService.java, I've instrumented the system with my highly complex debugging tool:

            NodeMetadata node = Iterables.getOnlyElement(nodes);
            InetAddress namenodePublicAddress = Iterables.getOnlyElement(node.getPublicAddresses());
            System.out.println("NN: " + namenodePublicAddress);
            System.out.println("NN hostname: " + namenodePublicAddress.getHostName());
            InetAddress jobtrackerPublicAddress = Iterables.getOnlyElement(node.getPublicAddresses());
            System.out.println("JT: " + jobtrackerPublicAddress);
            System.out.println("JT hostname: " + jobtrackerPublicAddress.getHostName());
        

        Here's the output:

        NN: /184.106.196.148
        NN hostname: 184-106-196-148.static.cloud-ips.com
        JT: 184-106-196-148.static.cloud-ips.com/184.106.196.148
        JT hostname: 184-106-196-148.static.cloud-ips.com
        

        It seems that a call to .getHostName() is altering the state of the node object such that the second call to .getPublicAddresses() is different from the first. Perhaps that's known behavior, but I'm just recording my investigations here for posterity.

        On the other hand, mapred.job.tracker is set using the .getHostName() call, and the result of that call seems constant and reasonable, so I'm not quite sure where things are going wrong. More sophisticated debugging awaits...

        Show
        Jeff Hammerbacher added a comment - Wow, seeing some potentially weird behavior from Rackspace. In HadoopService.java , I've instrumented the system with my highly complex debugging tool: NodeMetadata node = Iterables.getOnlyElement(nodes); InetAddress namenodePublicAddress = Iterables.getOnlyElement(node.getPublicAddresses()); System .out.println( "NN: " + namenodePublicAddress); System .out.println( "NN hostname: " + namenodePublicAddress.getHostName()); InetAddress jobtrackerPublicAddress = Iterables.getOnlyElement(node.getPublicAddresses()); System .out.println( "JT: " + jobtrackerPublicAddress); System .out.println( "JT hostname: " + jobtrackerPublicAddress.getHostName()); Here's the output: NN: /184.106.196.148 NN hostname: 184-106-196-148.static.cloud-ips.com JT: 184-106-196-148.static.cloud-ips.com/184.106.196.148 JT hostname: 184-106-196-148.static.cloud-ips.com It seems that a call to .getHostName() is altering the state of the node object such that the second call to .getPublicAddresses() is different from the first. Perhaps that's known behavior, but I'm just recording my investigations here for posterity. On the other hand, mapred.job.tracker is set using the .getHostName() call, and the result of that call seems constant and reasonable, so I'm not quite sure where things are going wrong. More sophisticated debugging awaits...
        Hide
        Jeff Hammerbacher added a comment - - edited

        I've confirmed that mapred.job.tracker is set to 184-106-196-148.static.cloud-ips.com:8021 on the configuration object used to submit the MapReduce job to the cluster. Might the JobClient be picking up the bad address somewhere else, particularly from the instances set in the HadoopCluster object? That guy is using the .getPublicAddresses() URL instead of the .getHostName() URL, which would result in the error I'm seeing above.

        Show
        Jeff Hammerbacher added a comment - - edited I've confirmed that mapred.job.tracker is set to 184-106-196-148.static.cloud-ips.com:8021 on the configuration object used to submit the MapReduce job to the cluster. Might the JobClient be picking up the bad address somewhere else, particularly from the instances set in the HadoopCluster object? That guy is using the .getPublicAddresses() URL instead of the .getHostName() URL, which would result in the error I'm seeing above.
        Hide
        Tom White added a comment -

        This looks like a DNS problem. See the Python contrib scripts to see how DNS is set up for Rackspace: I fear we'll need a similar approach here unless anyone has a better idea.

        Show
        Tom White added a comment - This looks like a DNS problem. See the Python contrib scripts to see how DNS is set up for Rackspace: I fear we'll need a similar approach here unless anyone has a better idea.
        Hide
        Tom White added a comment -

        I had a look at this and hit another problem (this is with ZooKeeper, which is a simpler service to launch, so I started with that one):

        Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 134.141 sec <<< FAILURE!
        test(org.apache.whirr.service.zookeeper.integration.ZooKeeperServiceTest)  Time elapsed: 134.007 sec  <<< ERROR!
        java.io.IOException: org.jclouds.compute.RunScriptOnNodesException: error runScript on filtered nodes options(RunScriptOptions [overridingCredentials=true, runAsRoot=true])
        Execution failures:
        
        0 error[s]
        Node failures:
        
        1) SshException on node 337685:
        org.jclouds.ssh.SshException: root@173.203.210.13:22: Error connecting to session.
                at org.jclouds.ssh.jsch.JschSshClient.propagate(JschSshClient.java:252)
                at org.jclouds.ssh.jsch.JschSshClient.connect(JschSshClient.java:199)
                at org.jclouds.compute.internal.BaseComputeService$3.call(BaseComputeService.java:357)
                at org.jclouds.compute.internal.BaseComputeService$3.call(BaseComputeService.java:346)
                at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
                at java.util.concurrent.FutureTask.run(FutureTask.java:138)
                at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
                at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
                at java.lang.Thread.run(Thread.java:637)
        Caused by: com.jcraft.jsch.JSchException: Auth fail
                at com.jcraft.jsch.Session.connect(Session.java:452)
                at com.jcraft.jsch.Session.connect(Session.java:150)
                at org.jclouds.ssh.jsch.JschSshClient.newSession(JschSshClient.java:247)
                at org.jclouds.ssh.jsch.JschSshClient.connect(JschSshClient.java:184)
                ... 7 more
        

        Rackspace doesn't return the credentials for nodes after the initial launch, so I tried to override the credentials with the private key.
        I'm not sure why there is an authentication error, since I was able to ssh in manually using the same private key.

        Show
        Tom White added a comment - I had a look at this and hit another problem (this is with ZooKeeper, which is a simpler service to launch, so I started with that one): Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 134.141 sec <<< FAILURE! test(org.apache.whirr.service.zookeeper.integration.ZooKeeperServiceTest) Time elapsed: 134.007 sec <<< ERROR! java.io.IOException: org.jclouds.compute.RunScriptOnNodesException: error runScript on filtered nodes options(RunScriptOptions [overridingCredentials=true, runAsRoot=true]) Execution failures: 0 error[s] Node failures: 1) SshException on node 337685: org.jclouds.ssh.SshException: root@173.203.210.13:22: Error connecting to session. at org.jclouds.ssh.jsch.JschSshClient.propagate(JschSshClient.java:252) at org.jclouds.ssh.jsch.JschSshClient.connect(JschSshClient.java:199) at org.jclouds.compute.internal.BaseComputeService$3.call(BaseComputeService.java:357) at org.jclouds.compute.internal.BaseComputeService$3.call(BaseComputeService.java:346) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:637) Caused by: com.jcraft.jsch.JSchException: Auth fail at com.jcraft.jsch.Session.connect(Session.java:452) at com.jcraft.jsch.Session.connect(Session.java:150) at org.jclouds.ssh.jsch.JschSshClient.newSession(JschSshClient.java:247) at org.jclouds.ssh.jsch.JschSshClient.connect(JschSshClient.java:184) ... 7 more Rackspace doesn't return the credentials for nodes after the initial launch, so I tried to override the credentials with the private key. I'm not sure why there is an authentication error, since I was able to ssh in manually using the same private key.
        Hide
        Adrian Cole added a comment -

        some thoughts:

        1. rackspace returns ip addresses, not dns hostnames
        2. rackspace returns multiple ip addresses
        ex.
        "public" : [
        "67.23.10.132",
        "67.23.10.131"
        ],

        I'd recommend changing our fields that refer to public/private ip addresses to lists (if not already), and use predicable tools for determining whether something is a hostname or not.
        ex. http://guava-libraries.googlecode.com/svn/trunk/javadoc/com/google/common/net/InternetDomainName.html

        mind isn't free enough to debug further at the moment!

        Show
        Adrian Cole added a comment - some thoughts: 1. rackspace returns ip addresses, not dns hostnames 2. rackspace returns multiple ip addresses ex. "public" : [ "67.23.10.132", "67.23.10.131" ], I'd recommend changing our fields that refer to public/private ip addresses to lists (if not already), and use predicable tools for determining whether something is a hostname or not. ex. http://guava-libraries.googlecode.com/svn/trunk/javadoc/com/google/common/net/InternetDomainName.html mind isn't free enough to debug further at the moment!
        Hide
        Tom White added a comment -

        Thanks for the ideas, Adrian. I'm not sure the IP address is the problem since I can manually ssh to the node using the IP address. (But I agree that at some point we should change to something that gives more control over resolution than InetAddress.)

        Show
        Tom White added a comment - Thanks for the ideas, Adrian. I'm not sure the IP address is the problem since I can manually ssh to the node using the IP address. (But I agree that at some point we should change to something that gives more control over resolution than InetAddress.)
        Hide
        Tom White added a comment -

        I finally managed to get this working. The problem was a private key with a non-empty passphrase, which was fine on EC2, but not Rackspace (not sure why). I've added some documentation warning about this.

        The ZooKeeper test works, but Cassandra doesn't yet (it's listening on the wrong interface, I think). Hadoop also fails, repeatedly, with the following error:

        2010-10-22 14:46:28,422 INFO  [org.apache.whirr.service.hadoop.HadoopService] (main) Starting 1 worker node(s)
        2010-10-22 14:46:30,127 ERROR [jclouds.compute] (user thread 1) starting nodes, completed: 0/1, errors: 1, rate: 625ms/op
        java.util.concurrent.ExecutionException: org.jclouds.rest.ResourceNotFoundException: /v1.0/418003/servers
        	at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:222)
        	at java.util.concurrent.FutureTask.get(FutureTask.java:83)
        	at org.jclouds.concurrent.FutureIterables$1.run(FutureIterables.java:121)
        	at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        	at java.lang.Thread.run(Thread.java:637)
        Caused by: org.jclouds.rest.ResourceNotFoundException: /v1.0/418003/servers
        	at org.jclouds.rackspace.cloudservers.handlers.ParseCloudServersErrorFromHttpResponse.handleError(ParseCloudServersErrorFromHttpResponse.java:69)
        	at org.jclouds.http.handlers.DelegatingErrorHandler.handleError(DelegatingErrorHandler.java:70)
        	at org.jclouds.http.internal.BaseHttpCommandExecutorService$HttpResponseCallable.shouldContinue(BaseHttpCommandExecutorService.java:193)
        	at org.jclouds.http.internal.BaseHttpCommandExecutorService$HttpResponseCallable.call(BaseHttpCommandExecutorService.java:163)
        	at org.jclouds.http.internal.BaseHttpCommandExecutorService$HttpResponseCallable.call(BaseHttpCommandExecutorService.java:132)
        	at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
        	at java.util.concurrent.FutureTask.run(FutureTask.java:138)
        	... 3 more
        2010-10-22 14:46:30,128 ERROR [jclouds.compute] (main) starting nodes, completed: 0/1, errors: 1, rate: 627ms/op
        java.lang.RuntimeException: starting nodes, completed: 0/1, errors: 1, rate: 627ms/op
        	at org.jclouds.concurrent.FutureIterables.awaitCompletion(FutureIterables.java:139)
        	at org.jclouds.compute.internal.BaseComputeService.runNodesWithTag(BaseComputeService.java:160)
        	at org.apache.whirr.service.hadoop.HadoopService.launchCluster(HadoopService.java:168)
        	at org.apache.whirr.service.hadoop.integration.HadoopServiceTest.setUp(HadoopServiceTest.java:89)
        	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        	at java.lang.reflect.Method.invoke(Method.java:597)
        	at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44)
        	at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
        	at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41)
        	at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:27)
        	at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:31)
        	at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:76)
        	at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
        	at org.junit.runners.ParentRunner$3.run(ParentRunner.java:193)
        	at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:52)
        	at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:191)
        	at org.junit.runners.ParentRunner.access$000(ParentRunner.java:42)
        	at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:184)
        	at org.junit.runners.ParentRunner.run(ParentRunner.java:236)
        	at org.apache.maven.surefire.junit4.JUnit4TestSet.execute(JUnit4TestSet.java:59)
        	at org.apache.maven.surefire.suite.AbstractDirectoryTestSuite.executeTestSet(AbstractDirectoryTestSuite.java:115)
        	at org.apache.maven.surefire.suite.AbstractDirectoryTestSuite.execute(AbstractDirectoryTestSuite.java:102)
        	at org.apache.maven.surefire.Surefire.run(Surefire.java:180)
        	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        	at java.lang.reflect.Method.invoke(Method.java:597)
        	at org.apache.maven.surefire.booter.SurefireBooter.runSuitesInProcess(SurefireBooter.java:350)
        	at org.apache.maven.surefire.booter.SurefireBooter.main(SurefireBooter.java:1021)
        

        Adrian, any idea what this might be?

        Show
        Tom White added a comment - I finally managed to get this working. The problem was a private key with a non-empty passphrase, which was fine on EC2, but not Rackspace (not sure why). I've added some documentation warning about this. The ZooKeeper test works, but Cassandra doesn't yet (it's listening on the wrong interface, I think). Hadoop also fails, repeatedly, with the following error: 2010-10-22 14:46:28,422 INFO [org.apache.whirr.service.hadoop.HadoopService] (main) Starting 1 worker node(s) 2010-10-22 14:46:30,127 ERROR [jclouds.compute] (user thread 1) starting nodes, completed: 0/1, errors: 1, rate: 625ms/op java.util.concurrent.ExecutionException: org.jclouds. rest .ResourceNotFoundException: /v1.0/418003/servers at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:222) at java.util.concurrent.FutureTask.get(FutureTask.java:83) at org.jclouds.concurrent.FutureIterables$1.run(FutureIterables.java:121) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang. Thread .run( Thread .java:637) Caused by: org.jclouds. rest .ResourceNotFoundException: /v1.0/418003/servers at org.jclouds.rackspace.cloudservers.handlers.ParseCloudServersErrorFromHttpResponse.handleError(ParseCloudServersErrorFromHttpResponse.java:69) at org.jclouds.http.handlers.DelegatingErrorHandler.handleError(DelegatingErrorHandler.java:70) at org.jclouds.http.internal.BaseHttpCommandExecutorService$HttpResponseCallable.shouldContinue(BaseHttpCommandExecutorService.java:193) at org.jclouds.http.internal.BaseHttpCommandExecutorService$HttpResponseCallable.call(BaseHttpCommandExecutorService.java:163) at org.jclouds.http.internal.BaseHttpCommandExecutorService$HttpResponseCallable.call(BaseHttpCommandExecutorService.java:132) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) ... 3 more 2010-10-22 14:46:30,128 ERROR [jclouds.compute] (main) starting nodes, completed: 0/1, errors: 1, rate: 627ms/op java.lang.RuntimeException: starting nodes, completed: 0/1, errors: 1, rate: 627ms/op at org.jclouds.concurrent.FutureIterables.awaitCompletion(FutureIterables.java:139) at org.jclouds.compute.internal.BaseComputeService.runNodesWithTag(BaseComputeService.java:160) at org.apache.whirr.service.hadoop.HadoopService.launchCluster(HadoopService.java:168) at org.apache.whirr.service.hadoop.integration.HadoopServiceTest.setUp(HadoopServiceTest.java:89) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:27) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:31) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:76) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:193) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:52) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:191) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:42) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:184) at org.junit.runners.ParentRunner.run(ParentRunner.java:236) at org.apache.maven.surefire.junit4.JUnit4TestSet.execute(JUnit4TestSet.java:59) at org.apache.maven.surefire.suite.AbstractDirectoryTestSuite.executeTestSet(AbstractDirectoryTestSuite.java:115) at org.apache.maven.surefire.suite.AbstractDirectoryTestSuite.execute(AbstractDirectoryTestSuite.java:102) at org.apache.maven.surefire.Surefire.run(Surefire.java:180) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.maven.surefire.booter.SurefireBooter.runSuitesInProcess(SurefireBooter.java:350) at org.apache.maven.surefire.booter.SurefireBooter.main(SurefireBooter.java:1021) Adrian, any idea what this might be?
        Hide
        Adrian Cole added a comment -

        first thing I'm noticing is that the same "tag" is used for the master node and the slave nodes. In jclouds the tag implies identical configuration, although I concede that it will still probably work looking more closely now.

        Show
        Adrian Cole added a comment - first thing I'm noticing is that the same "tag" is used for the master node and the slave nodes. In jclouds the tag implies identical configuration, although I concede that it will still probably work looking more closely now.
        Hide
        Adrian Cole added a comment -

        Looks like there's a bad configuration bubbling up. the error from jclouds should translate this into a IllegalArgumentException as eventhough rackspace returns a 404, it is really not accurate:

        2010-10-23 00:09:57,123 DEBUG [jclouds.wire] (i/o thread 0) >> "{"server":{"name":"hadoopclustertest-1ad","imageId":31,"flavorId":1}}"
        2010-10-23 00:09:57,124 DEBUG [jclouds.headers] (i/o thread 0) >> POST https://servers.api.rackspacecloud.com/v1.0/413274/servers?format=json&now=1287810597122 HTTP/1
        .1
        2010-10-23 00:09:57,124 DEBUG [jclouds.headers] (i/o thread 0) >> Accept: application/json
        2010-10-23 00:09:57,124 DEBUG [jclouds.headers] (i/o thread 0) >> X-Auth-Token: XXXXXXXXXXXXXXXXXX
        2010-10-23 00:09:57,124 DEBUG [jclouds.headers] (i/o thread 0) >> Content-Type: application/json
        2010-10-23 00:09:57,124 DEBUG [jclouds.headers] (i/o thread 0) >> Content-Length: 69
        2010-10-23 00:09:57,548 DEBUG [jclouds.headers] (i/o thread 0) << HTTP/1.1 404 Not Found
        2010-10-23 00:09:57,548 DEBUG [jclouds.headers] (i/o thread 0) << X-Varnish: 382298258
        2010-10-23 00:09:57,548 DEBUG [jclouds.headers] (i/o thread 0) << Age: 0
        2010-10-23 00:09:57,548 DEBUG [jclouds.headers] (i/o thread 0) << Date: Sat, 23 Oct 2010 05:09:57 GMT
        2010-10-23 00:09:57,548 DEBUG [jclouds.headers] (i/o thread 0) << Via: 1.1 varnish
        2010-10-23 00:09:57,548 DEBUG [jclouds.headers] (i/o thread 0) << Connection: keep-alive
        2010-10-23 00:09:57,548 DEBUG [jclouds.headers] (i/o thread 0) << Server: Apache-Coyote/1.1
        2010-10-23 00:09:57,548 DEBUG [jclouds.headers] (i/o thread 0) << vary: Accept, Accept-Encoding, X-Auth-Token
        2010-10-23 00:09:57,549 DEBUG [jclouds.headers] (i/o thread 0) << Cache-Control: no-cache
        2010-10-23 00:09:57,549 DEBUG [jclouds.headers] (i/o thread 0) << Content-Type: application/json
        2010-10-23 00:09:57,549 DEBUG [jclouds.headers] (i/o thread 0) << Content-Length: 166
        2010-10-23 00:09:57,549 DEBUG [jclouds.wire] (i/o thread 0) << "{"itemNotFound":{"message":"No offering found for flavor 1 and option 4","details":"com.rackspace.clou
        d.service.servers.ItemNotFoundFault: Fault occured","code":404}}"

        Show
        Adrian Cole added a comment - Looks like there's a bad configuration bubbling up. the error from jclouds should translate this into a IllegalArgumentException as eventhough rackspace returns a 404, it is really not accurate: 2010-10-23 00:09:57,123 DEBUG [jclouds.wire] (i/o thread 0) >> "{"server":{"name":"hadoopclustertest-1ad","imageId":31,"flavorId":1}}" 2010-10-23 00:09:57,124 DEBUG [jclouds.headers] (i/o thread 0) >> POST https://servers.api.rackspacecloud.com/v1.0/413274/servers?format=json&now=1287810597122 HTTP/1 .1 2010-10-23 00:09:57,124 DEBUG [jclouds.headers] (i/o thread 0) >> Accept: application/json 2010-10-23 00:09:57,124 DEBUG [jclouds.headers] (i/o thread 0) >> X-Auth-Token: XXXXXXXXXXXXXXXXXX 2010-10-23 00:09:57,124 DEBUG [jclouds.headers] (i/o thread 0) >> Content-Type: application/json 2010-10-23 00:09:57,124 DEBUG [jclouds.headers] (i/o thread 0) >> Content-Length: 69 2010-10-23 00:09:57,548 DEBUG [jclouds.headers] (i/o thread 0) << HTTP/1.1 404 Not Found 2010-10-23 00:09:57,548 DEBUG [jclouds.headers] (i/o thread 0) << X-Varnish: 382298258 2010-10-23 00:09:57,548 DEBUG [jclouds.headers] (i/o thread 0) << Age: 0 2010-10-23 00:09:57,548 DEBUG [jclouds.headers] (i/o thread 0) << Date: Sat, 23 Oct 2010 05:09:57 GMT 2010-10-23 00:09:57,548 DEBUG [jclouds.headers] (i/o thread 0) << Via: 1.1 varnish 2010-10-23 00:09:57,548 DEBUG [jclouds.headers] (i/o thread 0) << Connection: keep-alive 2010-10-23 00:09:57,548 DEBUG [jclouds.headers] (i/o thread 0) << Server: Apache-Coyote/1.1 2010-10-23 00:09:57,548 DEBUG [jclouds.headers] (i/o thread 0) << vary: Accept, Accept-Encoding, X-Auth-Token 2010-10-23 00:09:57,549 DEBUG [jclouds.headers] (i/o thread 0) << Cache-Control: no-cache 2010-10-23 00:09:57,549 DEBUG [jclouds.headers] (i/o thread 0) << Content-Type: application/json 2010-10-23 00:09:57,549 DEBUG [jclouds.headers] (i/o thread 0) << Content-Length: 166 2010-10-23 00:09:57,549 DEBUG [jclouds.wire] (i/o thread 0) << "{"itemNotFound":{"message":"No offering found for flavor 1 and option 4","details":"com.rackspace.clou d.service.servers.ItemNotFoundFault: Fault occured","code":404}}"
        Hide
        Adrian Cole added a comment -

        yeah.. looking at jclouds-compute.log, the second runNodesWithTag is getting the wrong image as something is zeroing out the default template:

        template search from master:
        2010-10-23 00:09:05,370 DEBUG [jclouds.compute] (main) >> searching params([biggest=false, fastest=false, imageName=.10\.?04., imageDescription=null, imageId=null,
        imageVersion=null, location=[id=DFW1, scope=ZONE, description=Dallas, TX, parent=cloudservers], minCores=0.0, minRam=0, osFamily=ubuntu, osName=null, osDescription=nu
        ll, osVersion=null, osArch=null, os64Bit=null, hardwareId=null])

        template search from slave:

        2010-10-23 00:09:52,353 DEBUG [jclouds.compute] (main) >> searching params([biggest=false, fastest=false, imageName=null, imageDescription=null, imageId=null, imageVe
        rsion=null, location=[id=DFW1, scope=ZONE, description=Dallas, TX, parent=cloudservers], minCores=0.0, minRam=0, osFamily=null, osName=null, osDescription=null, osVer
        sion=null, osArch=null, os64Bit=null, hardwareId=null])

        Notice that imageName is nulled on the second, which is making the search grab something a bit too random

        Show
        Adrian Cole added a comment - yeah.. looking at jclouds-compute.log, the second runNodesWithTag is getting the wrong image as something is zeroing out the default template: template search from master: 2010-10-23 00:09:05,370 DEBUG [jclouds.compute] (main) >> searching params([biggest=false, fastest=false, imageName=. 10\.?04. , imageDescription=null, imageId=null, imageVersion=null, location= [id=DFW1, scope=ZONE, description=Dallas, TX, parent=cloudservers] , minCores=0.0, minRam=0, osFamily=ubuntu, osName=null, osDescription=nu ll, osVersion=null, osArch=null, os64Bit=null, hardwareId=null]) template search from slave: 2010-10-23 00:09:52,353 DEBUG [jclouds.compute] (main) >> searching params([biggest=false, fastest=false, imageName=null, imageDescription=null, imageId=null, imageVe rsion=null, location= [id=DFW1, scope=ZONE, description=Dallas, TX, parent=cloudservers] , minCores=0.0, minRam=0, osFamily=null, osName=null, osDescription=null, osVer sion=null, osArch=null, os64Bit=null, hardwareId=null]) Notice that imageName is nulled on the second, which is making the search grab something a bit too random
        Hide
        Adrian Cole added a comment -

        The following line is resetting the template:

        slaveTemplateBuilder.locationId(masterTemplate.getLocation().getId());

        locationId on clouds has the potential to invalidate image choices, as they are not always in all locations (ex. ec2). As a precaution, when someone explicitly specifies a locationId, jclouds resets the rest of the template to default. Now, we can argue to change this behaviour, but as of beta-7 this is the case.

        A workaround is to comment out the above line. Then, we should figure out what logic jclouds should follow and request a corresponding change in the project. We can also patch whatever that is in whirr to make it immediately available.

        -Adrian

        Show
        Adrian Cole added a comment - The following line is resetting the template: slaveTemplateBuilder.locationId(masterTemplate.getLocation().getId()); locationId on clouds has the potential to invalidate image choices, as they are not always in all locations (ex. ec2). As a precaution, when someone explicitly specifies a locationId, jclouds resets the rest of the template to default. Now, we can argue to change this behaviour, but as of beta-7 this is the case. A workaround is to comment out the above line. Then, we should figure out what logic jclouds should follow and request a corresponding change in the project. We can also patch whatever that is in whirr to make it immediately available. -Adrian
        Hide
        Tom White added a comment -

        Thanks for tracking this down Adrian!

        > As a precaution, when someone explicitly specifies a locationId, jclouds resets the rest of the template to default. Now, we can argue to change this behaviour, but as of beta-7 this is the case.

        Yes, this was unexpected. Does EC2 have different behaviour?

        > A workaround is to comment out the above line. Then, we should figure out what logic jclouds should follow and request a corresponding change in the project. We can also patch whatever that is in whirr to make it immediately available.

        Setting the location ID was introduced in WHIRR-113. We could also just build a new template for the moment. I'll produce a new patch.

        Show
        Tom White added a comment - Thanks for tracking this down Adrian! > As a precaution, when someone explicitly specifies a locationId, jclouds resets the rest of the template to default. Now, we can argue to change this behaviour, but as of beta-7 this is the case. Yes, this was unexpected. Does EC2 have different behaviour? > A workaround is to comment out the above line. Then, we should figure out what logic jclouds should follow and request a corresponding change in the project. We can also patch whatever that is in whirr to make it immediately available. Setting the location ID was introduced in WHIRR-113 . We could also just build a new template for the moment. I'll produce a new patch.
        Hide
        Tom White added a comment -

        With this patch all the integration tests run on Rackspace (and they continue to pass on EC2).

        Show
        Tom White added a comment - With this patch all the integration tests run on Rackspace (and they continue to pass on EC2).
        Hide
        Tom White added a comment -

        I've just committed this.

        Show
        Tom White added a comment - I've just committed this.

          People

          • Assignee:
            Tom White
            Reporter:
            Jeff Hammerbacher
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development