Whirr
  1. Whirr
  2. WHIRR-588

Timeouts on whirr run-script command

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Won't Fix
    • Affects Version/s: 0.7.1
    • Fix Version/s: 0.8.0
    • Component/s: core
    • Labels:
      None

      Description

      I started a Hadoop cluster using the following additional config

      jclouds.compute.timeout.script-complete=1800000
      

      and I run a script like this

      whirr run-script --script <script> --roles <role> --config <config>
      

      I keep getting timeouts:

      ]) (out of retries - max 7): Read timed out
      	at org.jclouds.sshj.SshjSshClient.propagate(SshjSshClient.java:417)
      	at org.jclouds.sshj.SshjSshClient.acquire(SshjSshClient.java:238)
      	at org.jclouds.sshj.SshjSshClient.exec(SshjSshClient.java:499)
      	at org.jclouds.compute.callables.RunScriptOnNodeUsingSsh.runCommand(RunScriptOnNodeUsingSsh.java:113)
      	at org.jclouds.compute.callables.RunScriptOnNodeUsingSsh.call(RunScriptOnNodeUsingSsh.java:86)
      	at org.jclouds.compute.strategy.RunScriptOnNodeAndAddToGoodMapOrPutExceptionIntoBadMap.call(RunScriptOnNodeAndAddToGoodMapOrPutExceptionIntoBadMap.java:69)
      	at org.jclouds.compute.strategy.RunScriptOnNodeAndAddToGoodMapOrPutExceptionIntoBadMap.call(RunScriptOnNodeAndAddToGoodMapOrPutExceptionIntoBadMap.java:44)
      	at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
      	at java.util.concurrent.FutureTask.run(FutureTask.java:138)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
      	at java.lang.Thread.run(Thread.java:662)
      Caused by: net.schmizz.sshj.connection.ConnectionException: Read timed out
      	at net.schmizz.sshj.connection.ConnectionException$1.chain(ConnectionException.java:32)
      	at net.schmizz.sshj.connection.ConnectionException$1.chain(ConnectionException.java:26)
      	at net.schmizz.concurrent.Promise.deliverError(Promise.java:95)
      	at net.schmizz.concurrent.Event.deliverError(Event.java:72)
      	at net.schmizz.concurrent.ErrorDeliveryUtil.alertEvents(ErrorDeliveryUtil.java:34)
      	at net.schmizz.sshj.connection.channel.AbstractChannel.notifyError(AbstractChannel.java:239)
      	at net.schmizz.sshj.connection.channel.direct.SessionChannel.notifyError(SessionChannel.java:249)
      	at net.schmizz.sshj.common.ErrorNotifiable$Util.alertAll(ErrorNotifiable.java:35)
      	at net.schmizz.sshj.connection.ConnectionImpl.notifyError(ConnectionImpl.java:250)
      	at net.schmizz.sshj.transport.TransportImpl.die(TransportImpl.java:578)
      	at net.schmizz.sshj.transport.Reader.run(Reader.java:79)
      Caused by: net.schmizz.sshj.common.SSHException: Read timed out
      	at net.schmizz.sshj.common.SSHException$1.chain(SSHException.java:56)
      	at net.schmizz.sshj.common.SSHException$1.chain(SSHException.java:49)
      	at net.schmizz.sshj.transport.TransportImpl.die(TransportImpl.java:572)
      	... 1 more
      Caused by: java.net.SocketTimeoutException: Read timed out
      	at java.net.SocketInputStream.socketRead0(Native Method)
      	at java.net.SocketInputStream.read(SocketInputStream.java:129)
      	at net.schmizz.sshj.transport.Reader.run(Reader.java:68)
      
      
      1 error[s]
      	at org.jclouds.compute.internal.BaseComputeService.runScriptOnNodesMatching(BaseComputeService.java:529)
      	at org.apache.whirr.ClusterController.runScriptOnNodesMatching(ClusterController.java:278)
      	at org.apache.whirr.ClusterController.runScriptOnNodesMatching(ClusterController.java:259)
      	at org.apache.whirr.cli.command.RunScriptCommand.run(RunScriptCommand.java:125)
      	at org.apache.whirr.cli.command.RunScriptCommand.run(RunScriptCommand.java:111)
      	at org.apache.whirr.cli.Main.run(Main.java:69)
      	at org.apache.whirr.cli.Main.main(Main.java:102)
      

        Activity

        Hide
        Frank Scholten added a comment -

        Shorter commands to work. However when I for example try to mount an EBS I get timeouts:

        sudo su
        
        # mount the Amazon ASF Email Public Dataset
        mkdir /mnt/asf-email
        mount /dev/sdf /mnt/asf-email
        
        Show
        Frank Scholten added a comment - Shorter commands to work. However when I for example try to mount an EBS I get timeouts: sudo su # mount the Amazon ASF Email Public Dataset mkdir /mnt/asf-email mount /dev/sdf /mnt/asf-email
        Hide
        Adrian Cole added a comment -

        normally read timeouts can be a sign of user entry (ex. blocking on a sudo password)

        if you look at jclouds-ssh.log, it might tell more about what happened.

        Show
        Adrian Cole added a comment - normally read timeouts can be a sign of user entry (ex. blocking on a sudo password) if you look at jclouds-ssh.log, it might tell more about what happened.
        Hide
        Karel Vervaeke added a comment -

        Frank, did you have a chance to get a look at signs for user-input or other blockage?

        Show
        Karel Vervaeke added a comment - Frank, did you have a chance to get a look at signs for user-input or other blockage?
        Hide
        Frank Scholten added a comment -

        Adrian was right, the sudo su command was the culprit. I used sudo and now it works normally.

        Show
        Frank Scholten added a comment - Adrian was right, the sudo su command was the culprit. I used sudo and now it works normally.

          People

          • Assignee:
            Unassigned
            Reporter:
            Frank Scholten
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development