Whirr
  1. Whirr
  2. WHIRR-264

JClouds is unable to do SSH on automatically selected images

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Blocker Blocker
    • Resolution: Not a Problem
    • Affects Version/s: None
    • Fix Version/s: 0.4.0
    • Component/s: None
    • Labels:
      None

      Description

      I'm seeing the following exception when trying to start a cluster and when running integration tests without specifying an AMI and an instance type:

      org.jclouds.ssh.SshException: ec2-user@184.72.64.23:22: Error connecting to session.
      	at org.jclouds.ssh.jsch.JschSshClient.propagate(JschSshClient.java:252)
      	at org.jclouds.ssh.jsch.JschSshClient.connect(JschSshClient.java:206)
      	at org.jclouds.compute.callables.RunScriptOnNodeAsInitScriptUsingSsh.call(RunScriptOnNodeAsInitScriptUsingSsh.java:90)
      	at org.jclouds.compute.strategy.RunScriptOnNodeAndAddToGoodMapOrPutExceptionIntoBadMap.call(RunScriptOnNodeAndAddToGoodMapOrPutExceptionIntoBadMap.java:70)
      	at org.jclouds.compute.strategy.RunScriptOnNodeAndAddToGoodMapOrPutExceptionIntoBadMap.call(RunScriptOnNodeAndAddToGoodMapOrPutExceptionIntoBadMap.java:45)
      	at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
      	at java.util.concurrent.FutureTask.run(FutureTask.java:138)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
      	at java.lang.Thread.run(Thread.java:662)
      Caused by: com.jcraft.jsch.JSchException: Auth fail
      	at com.jcraft.jsch.Session.connect(Session.java:461)
      	at com.jcraft.jsch.Session.connect(Session.java:154)
      	at org.jclouds.ssh.jsch.JschSshClient.newSession(JschSshClient.java:247)
      	at org.jclouds.ssh.jsch.JschSshClient.connect(JschSshClient.java:186)
      	... 8 more
      

      I have been able to run the entire test suite when I changed the properties files and specified an image and a instance type (ubuntu 10.4 on machine with 2GB or more).

      You can reproduce the problem by trying to run the ZooKeeper recipe:
      $ ./bin/whirr launch-cluster --config recipes/zookeeper-ec2.properties

      I've experienced this problem with the following ami: us-east-1/ami-8e1fece7 running on a t1.micro instance type.

        Activity

        Hide
        Adrian Cole added a comment -

        This is probably limited to the amazon linux image in aws-ec2 and not automatically selected images in general. Will check into it now!

        Show
        Adrian Cole added a comment - This is probably limited to the amazon linux image in aws-ec2 and not automatically selected images in general. Will check into it now!
        Hide
        Adrian Cole added a comment -

        just ran cassandra tests from trunk, whose default test uses t1.micro and also the same ami without auth problems during bootstrap. However, it does later get auth problems during configure.

        Did your auth error in ZK come during bootstrap or configure?

        Show
        Adrian Cole added a comment - just ran cassandra tests from trunk, whose default test uses t1.micro and also the same ami without auth problems during bootstrap. However, it does later get auth problems during configure. Did your auth error in ZK come during bootstrap or configure?
        Hide
        Andrei Savu added a comment -

        I've seen this error during configure. The output from the bootstrap phase looks ok to me.

        Show
        Andrei Savu added a comment - I've seen this error during configure. The output from the bootstrap phase looks ok to me.
        Hide
        Andrei Savu added a comment -

        It's strange that we are seeing this behavior only for some AMIs. I have been able to run all the integration tests on us-east-1/ami-da0cf8b3 running on a m1.large instance.

        Show
        Andrei Savu added a comment - It's strange that we are seeing this behavior only for some AMIs. I have been able to run all the integration tests on us-east-1/ami-da0cf8b3 running on a m1.large instance.
        Hide
        Adrian Cole added a comment -

        The problem is that we are dependent on the state of a user we don't define during runs in configure. For example, we modify the authorized keys and private key of the default user, which varies from image to image, and also cloud to cloud. This has proven problematic, as the image can change how this user is defined. While bootstrap may work well, key installation may fail for a subtlety in how that user is configured. The real way out of this is to stop depending on the installed user and instead install our own.

        https://issues.apache.org/jira/browse/WHIRR-158

        Show
        Adrian Cole added a comment - The problem is that we are dependent on the state of a user we don't define during runs in configure. For example, we modify the authorized keys and private key of the default user, which varies from image to image, and also cloud to cloud. This has proven problematic, as the image can change how this user is defined. While bootstrap may work well, key installation may fail for a subtlety in how that user is configured. The real way out of this is to stop depending on the installed user and instead install our own. https://issues.apache.org/jira/browse/WHIRR-158
        Hide
        Andrei Savu added a comment -

        I understand. Is there anything we can do now to fix this issue in order to be able to release 0.4.0 or we need to create a patch for WHIRR-158?

        Show
        Andrei Savu added a comment - I understand. Is there anything we can do now to fix this issue in order to be able to release 0.4.0 or we need to create a patch for WHIRR-158 ?
        Hide
        Adrian Cole added a comment -

        I've tested the patch in WHIRR-158. I think it would be more sustainable to push this through as not only does it fix this issue, but it makes whirr easier to troubleshoot (ex. don't have to remember the login-user of the image)

        Show
        Adrian Cole added a comment - I've tested the patch in WHIRR-158 . I think it would be more sustainable to push this through as not only does it fix this issue, but it makes whirr easier to troubleshoot (ex. don't have to remember the login-user of the image)
        Hide
        David Alves added a comment -

        I was having this problem (bootstrap ok but failed config phase) with my own recipe and I can confirm that the WHIRR-158 patch solves it.

        Show
        David Alves added a comment - I was having this problem (bootstrap ok but failed config phase) with my own recipe and I can confirm that the WHIRR-158 patch solves it.
        Hide
        Andrei Savu added a comment -

        Closing as "Not A Problem" thanks to the fix from WHIRR-158.

        Show
        Andrei Savu added a comment - Closing as "Not A Problem" thanks to the fix from WHIRR-158 .

          People

          • Assignee:
            Adrian Cole
            Reporter:
            Andrei Savu
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development