Uploaded image for project: 'Airavata'
  1. Airavata
  2. AIRAVATA-2327

Process status messages lost by orchestrator

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 0.17
    • 0.18
    • Airavata Orchestrator
    • None

    Description

      Zhong with the dREG gateway reported an experiment where the status was "stuck" in EXECUTING but the job had status COMPLETED. It looks like what happened is that the api-orch service on gw56 was shutdown probably at the same time that the orchestrator was handling the COMPLETED process status message. The process status subscriber automatically acks messages so it was taken out of the queue and not available when the orchestrator was restarted.

      In gfac's log, the process completes at 2017-02-17 13:41:01

      2017-02-17 13:41:01 [pool-9-thread-11] INFO o.a.a.g.core.context.ProcessContext - expId: Clone_of_2M_data_82c732b8-5bd5-4e24-b1cc-ce3fd480d677, processId: PROCESS_3b22553a-b9ed-4250-a1dd-8b555ecede80 :- Process status changed OUTPUT_DATA_S
      

      api-orch was shut down and restarted several times around the same time

      2017-02-17 13:37:03 [main] INFO o.a.a.api.server.AiravataAPIServer - API server started over TLS on Port: 9930 ...
      ...
      2017-02-17 13:40:23 [main] INFO o.a.a.api.server.AiravataAPIServer - API server started over TLS on Port: 9930 ...
      ...
      2017-02-17 13:43:02 [main] INFO o.a.a.api.server.AiravataAPIServer - API server started over TLS on Port: 9930 ...
      ...
      2017-02-17 13:48:23 [main] INFO o.a.a.api.server.AiravataAPIServer - API server started over TLS on Port: 9930 ...
      ...
      2017-02-17 14:10:58 [main] INFO o.a.a.api.server.AiravataAPIServer - API server started over TLS on Port: 9930 ...
      

      A couple of solution ideas:

      • make the status queue subscriber set to acknowledge messages
      • have the orchestrator check the process status in the registry for every incomplete experiment when it starts up

      Attachments

        Activity

          People

            shameera Shameera
            marcuschristie Marcus Christie
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: