Uploaded image for project: 'Airavata'
  1. Airavata
  2. AIRAVATA-755

Incompatible parameters on HPC Configuration needs validation process

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Minor
    • Resolution: Invalid
    • 0.6
    • WISHLIST
    • GFac, XBaya
    • None

    Description

      I setup a workflow to run on Lonestar with the following HPC configuration

      Max Wall time: 1440
      CPU Count: 64
      Node: 6
      Processor Per Node: 12
      Min Memory: 10240
      Max: 15360

      It had 4 inputs and one single output. Unfortunately, my workflow task never got submitted to Lonestar.
      The Airavata Server had the error message below:

      So, I only changed the parameter from "CPU Count: 64" to "CPU Count: 72" and my workflow task got submitted to Lonestar correctly.

      ===========
      Error Message
      ======================================================================
      [INFO] ----DATA----
      [INFO] https://gridftp1.ls4.tacc.utexas.edu:50393/16289878754566973521/8943296923859968446/
      [INFO] gridftp1.ls4.tacc.utexas.edu:2119/jobmanager-sge
      [INFO] null
      [INFO] null
      [INFO] /C=US/O=National Center for Supercomputing Applications/CN=OGCE Community User
      [INFO] null
      [INFO] &( queue = "normal" )( stdout = "/scratch/01437/ogce/Vlab/Phonon/_p3_14/AppPhononSingle_Wed_Jan_30_19_33_57_CST_2013_81e76edb-dec5-4a04-ac8b-dead149cf5b3/lonestar_application.stdout" )( count = "64" )( executable = "/scratch/01437/ogce/Vlab/Phonon/executePhonon.sh" )( stderr = "/scratch/01437/ogce/Vlab/Phonon/p3_14/AppPhononSingle_Wed_Jan_30_19_33_57_CST_2013_81e76edb-dec5-4a04-ac8b-dead149cf5b3/lonestar_application.stderr" )( maxwalltime = "1440" )( hostCount = "6" )( minmemory = "10240" )( project = "TG-STA110014S" )( jobtype = "mpi" )( environment = ( "inputData" "/scratch/01437/ogce/Vlab/Phonon/p3_14/AppPhononSingle_Wed_Jan_30_19_33_57_CST_2013_81e76edb-dec5-4a04-ac8b-dead149cf5b3/inputData" ) ( "outputData" "/scratch/01437/ogce/Vlab/Phonon/p3_14/AppPhononSingle_Wed_Jan_30_19_33_57_CST_2013_81e76edb-dec5-4a04-ac8b-dead149cf5b3/outputData" ) )( proxy_timeout = "1" )( arguments = "///scratch/01437/ogce/Vlab/Phonon/p3_14/AppPhononSingle_Wed_Jan_30_19_33_57_CST_2013_81e76edb-dec5-4a04-ac8b-dead149cf5b3/inputData/Pwscf_Input" "///scratch/01437/ogce/Vlab/Phonon/p3_14/AppPhononSingle_Wed_Jan_30_19_33_57_CST_2013_81e76edb-dec5-4a04-ac8b-dead149cf5b3/inputData/Cd_PON_sp_LDA.vdb" "///scratch/01437/ogce/Vlab/Phonon/p3_14/AppPhononSingle_Wed_Jan_30_19_33_57_CST_2013_81e76edb-dec5-4a04-ac8b-dead149cf5b3/inputData/Te_PON_LDA.vdb" "///scratch/01437/ogce/Vlab/Phonon/p3_14/AppPhononSingle_Wed_Jan_30_19_33_57_CST_2013_81e76edb-dec5-4a04-ac8b-dead149cf5b3/inputData/Phonon_Input" )( directory = "/scratch/01437/ogce/Vlab/Phonon/_p3_14/AppPhononSingle_Wed_Jan_30_19_33_57_CST_2013_81e76edb-dec5-4a04-ac8b-dead149cf5b3" )( maxmemory = "15360" )
      [INFO] ----END DATA----
      [INFO] Status is zero
      [INFO] Status of job https://gridftp1.ls4.tacc.utexas.edu:50393/16289878754566973521/8943296923859968446/is FAILED
      [INFO] ----DATA----
      [INFO] Status of job https://gridftp1.ls4.tacc.utexas.edu:50393/16289878754566973521/8943296923859968446/is FAILED
      [INFO] ----END DATA----
      [INFO] Job Error Code: 14
      [ERROR] Context passed was NULL.
      java.lang.RuntimeException: Context passed was NULL.
      at org.apache.airavata.workflow.tracking.impl.ProvenanceNotifierImpl.sendingFault(ProvenanceNotifierImpl.java:496)
      at org.apache.airavata.workflow.tracking.impl.ProvenanceNotifierImpl.sendingFault(ProvenanceNotifierImpl.java:485)
      at org.apache.airavata.core.gfac.notification.impl.WorkflowTrackingNotification.executionFail(WorkflowTrackingNotification.java:108)
      at org.apache.airavata.core.gfac.notification.impl.DefaultNotifier.executionFail(DefaultNotifier.java:135)
      at org.apache.airavata.core.gfac.provider.impl.GramProvider.executeApplication(GramProvider.java:225)
      at org.apache.airavata.core.gfac.provider.AbstractProvider.execute(AbstractProvider.java:69)
      at org.apache.airavata.core.gfac.services.impl.AbstractSimpleService.execute(AbstractSimpleService.java:118)
      at org.apache.airavata.core.gfac.GfacAPI.gridJobSubmit(GfacAPI.java:140)
      at org.apache.airavata.xbaya.invoker.EmbeddedGFacInvoker.invoke(EmbeddedGFacInvoker.java:256)
      at org.apache.airavata.xbaya.interpretor.WorkflowInterpreter.handleWSComponent(WorkflowInterpreter.java:749)
      at org.apache.airavata.xbaya.interpretor.WorkflowInterpreter.executeDynamically(WorkflowInterpreter.java:533)
      at org.apache.airavata.xbaya.interpretor.WorkflowInterpreter.scheduleDynamically(WorkflowInterpreter.java:218)
      at org.apache.airavata.xbaya.interpretor.WorkflowInterpretorSkeleton.executeWorkflow(WorkflowInterpretorSkeleton.java:389)
      at org.apache.airavata.xbaya.interpretor.WorkflowInterpretorSkeleton.access$400(WorkflowInterpretorSkeleton.java:87)
      at org.apache.airavata.xbaya.interpretor.WorkflowInterpretorSkeleton$2.run(WorkflowInterpretorSkeleton.java:382)
      at java.lang.Thread.run(Thread.java:680)
      [INFO] ----DATA----
      [INFO] Job Protocol : https
      Host name : gridftp1.ls4.tacc.utexas.edu
      Port number : 50393
      Url path : 16289878754566973521/8943296923859968446/
      User : null
      Pwd : null
      on host lonestar4.tacc.teragrid.org Job Exit Code = 14
      [INFO] ----END DATA----
      [ERROR] Job Protocol : https
      Host name : gridftp1.ls4.tacc.utexas.edu
      Port number : 50393
      Url path : 16289878754566973521/8943296923859968446/
      User : null
      Pwd : null
      on host lonestar4.tacc.teragrid.org Job Exit Code = 14
      org.apache.airavata.core.gfac.exception.JobSubmissionFault: Job Protocol : https
      Host name : gridftp1.ls4.tacc.utexas.edu
      Port number : 50393
      Url path : 16289878754566973521/8943296923859968446/
      User : null
      Pwd : null
      on host lonestar4.tacc.teragrid.org Job Exit Code = 14
      at org.apache.airavata.core.gfac.provider.impl.GramProvider.executeApplication(GramProvider.java:222)
      at org.apache.airavata.core.gfac.provider.AbstractProvider.execute(AbstractProvider.java:69)
      at org.apache.airavata.core.gfac.services.impl.AbstractSimpleService.execute(AbstractSimpleService.java:118)
      at org.apache.airavata.core.gfac.GfacAPI.gridJobSubmit(GfacAPI.java:140)
      at org.apache.airavata.xbaya.invoker.EmbeddedGFacInvoker.invoke(EmbeddedGFacInvoker.java:256)
      at org.apache.airavata.xbaya.interpretor.WorkflowInterpreter.handleWSComponent(WorkflowInterpreter.java:749)
      at org.apache.airavata.xbaya.interpretor.WorkflowInterpreter.executeDynamically(WorkflowInterpreter.java:533)
      at org.apache.airavata.xbaya.interpretor.WorkflowInterpreter.scheduleDynamically(WorkflowInterpreter.java:218)
      at org.apache.airavata.xbaya.interpretor.WorkflowInterpretorSkeleton.executeWorkflow(WorkflowInterpretorSkeleton.java:389)
      at org.apache.airavata.xbaya.interpretor.WorkflowInterpretorSkeleton.access$400(WorkflowInterpretorSkeleton.java:87)
      at org.apache.airavata.xbaya.interpretor.WorkflowInterpretorSkeleton$2.run(WorkflowInterpretorSkeleton.java:382)
      at java.lang.Thread.run(Thread.java:680)
      Caused by: java.lang.Exception: Job Protocol : https
      Host name : gridftp1.ls4.tacc.utexas.edu
      Port number : 50393
      Url path : 16289878754566973521/8943296923859968446/
      User : null
      Pwd : null
      on host lonestar4.tacc.teragrid.org Job Exit Code = 14
      ... 12 more
      Exception in thread "Thread-58" org.apache.airavata.workflow.model.exceptions.WorkflowRuntimeException: org.apache.airavata.workflow.model.exceptions.WorkflowException: Job Protocol : https
      Host name : gridftp1.ls4.tacc.utexas.edu
      Port number : 50393
      Url path : 16289878754566973521/8943296923859968446/
      User : null
      Pwd : null
      on host lonestar4.tacc.teragrid.org Job Exit Code = 14
      at org.apache.airavata.xbaya.interpretor.WorkflowInterpretorSkeleton.executeWorkflow(WorkflowInterpretorSkeleton.java:392)
      at org.apache.airavata.xbaya.interpretor.WorkflowInterpretorSkeleton.access$400(WorkflowInterpretorSkeleton.java:87)
      at org.apache.airavata.xbaya.interpretor.WorkflowInterpretorSkeleton$2.run(WorkflowInterpretorSkeleton.java:382)
      at java.lang.Thread.run(Thread.java:680)
      Caused by: org.apache.airavata.workflow.model.exceptions.WorkflowException: Job Protocol : https
      Host name : gridftp1.ls4.tacc.utexas.edu
      Port number : 50393
      Url path : 16289878754566973521/8943296923859968446/
      User : null
      Pwd : null
      on host lonestar4.tacc.teragrid.org Job Exit Code = 14
      at org.apache.airavata.xbaya.invoker.EmbeddedGFacInvoker.invoke(EmbeddedGFacInvoker.java:321)
      at org.apache.airavata.xbaya.interpretor.WorkflowInterpreter.handleWSComponent(WorkflowInterpreter.java:749)
      at org.apache.airavata.xbaya.interpretor.WorkflowInterpreter.executeDynamically(WorkflowInterpreter.java:533)
      at org.apache.airavata.xbaya.interpretor.WorkflowInterpreter.scheduleDynamically(WorkflowInterpreter.java:218)
      at org.apache.airavata.xbaya.interpretor.WorkflowInterpretorSkeleton.executeWorkflow(WorkflowInterpretorSkeleton.java:389)
      ... 3 more
      Caused by: org.apache.airavata.core.gfac.exception.JobSubmissionFault: Job Protocol : https
      Host name : gridftp1.ls4.tacc.utexas.edu
      Port number : 50393
      Url path : 16289878754566973521/8943296923859968446/
      User : null
      Pwd : null
      on host lonestar4.tacc.teragrid.org Job Exit Code = 14
      at org.apache.airavata.core.gfac.provider.impl.GramProvider.executeApplication(GramProvider.java:222)
      at org.apache.airavata.core.gfac.provider.AbstractProvider.execute(AbstractProvider.java:69)
      at org.apache.airavata.core.gfac.services.impl.AbstractSimpleService.execute(AbstractSimpleService.java:118)
      at org.apache.airavata.core.gfac.GfacAPI.gridJobSubmit(GfacAPI.java:140)
      at org.apache.airavata.xbaya.invoker.EmbeddedGFacInvoker.invoke(EmbeddedGFacInvoker.java:256)
      ... 7 more
      Caused by: java.lang.Exception: Job Protocol : https
      Host name : gridftp1.ls4.tacc.utexas.edu
      Port number : 50393
      Url path : 16289878754566973521/8943296923859968446/
      User : null
      Pwd : null
      on host lonestar4.tacc.teragrid.org Job Exit Code = 14
      ... 12 more
      ======================================================================

      Attachments

        Activity

          People

            lahiru Lahiru Gunathilake
            pedrorcs Pedro da Silveira
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: