Uploaded image for project: 'Airavata'
  1. Airavata
  2. AIRAVATA-2826

Helix participant server was stopped and started while experiments are launched and job submissions to Jetstream cluster failed

    XMLWordPrintableJSON

Details

    Description

      1. Experiments started launching while helix participant stopped and started.
      2. When the helix participant was started particularly jobs to Jetstream failed.
      3. Job submission failed due to environment set up failed in jetstream with error [1] 

      [1]

      org.apache.airavata.helix.impl.task.TaskOnFailException: Error Code : 658d46e9-b08b-46c0-9701-4bf5eeb23134, Task TASK_f4e3eccf-3e03-4d34-9cf0-7028efd09a40 failed due to Failed to setup environment of task TASK_f4e3eccf-3e03-4d34-9cf0-7028efd09a40, net.schmizz.sshj.connection.ConnectionException: [CONNECTION_LOST] Did not receive any keep-alive response for 25 seconds at org.apache.airavata.helix.impl.task.AiravataTask.onFail(AiravataTask.java:102) at org.apache.airavata.helix.impl.task.env.EnvSetupTask.onRun(EnvSetupTask.java:55) at org.apache.airavata.helix.impl.task.AiravataTask.onRun(AiravataTask.java:311) at org.apache.airavata.helix.core.AbstractTask.run(AbstractTask.java:90) at org.apache.helix.task.TaskRunner.run(TaskRunner.java:71) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: org.apache.airavata.agents.api.AgentException: net.schmizz.sshj.connection.ConnectionException: [CONNECTION_LOST] Did not receive any keep-alive response for 25 seconds at org.apache.airavata.helix.adaptor.SSHJAgentAdaptor.createDirectory(SSHJAgentAdaptor.java:146) at org.apache.airavata.helix.impl.task.env.EnvSetupTask.onRun(EnvSetupTask.java:51) ... 10 more Caused by: net.schmizz.sshj.connection.ConnectionException: [CONNECTION_LOST] Did not receive any keep-alive response for 25 seconds at net.schmizz.keepalive.KeepAliveRunner.checkMaxReached(KeepAliveRunner.java:64) at net.schmizz.keepalive.KeepAliveRunner.doKeepAlive(KeepAliveRunner.java:56) at net.schmizz.keepalive.KeepAlive.run(KeepAlive.java:63)

      Attachments

        Activity

          People

            dimuthuupe Dimuthu
            eroma_a Eroma
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: