Uploaded image for project: 'Airavata'
  1. Airavata
  2. AIRAVATA-2519

Email monitoring stopped, without errors



    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 0.18
    • None
    • GFac
    • None


      Today at 12:16 am the EmailBasedMonitor just appears to have stopped working and died silently:

      From Hipchat

      @marlon I didn't see any errors in the gfac log, its just that the last log messages from the EmailBasedMonitor where it is processing emails occurs at 2017-09-20 00:16:49,447

      Here are the last messages from the EmailBasedMonitor in the logs

      2017-09-20 00:16:25,815 [Thread-5] ERROR o.a.a.g.m.e.EmailBasedMonitor  - FROM: root <root@ncsa.illinois.edu>
      2017-09-20 00:16:25,815 [Thread-5] ERROR o.a.a.g.m.e.EmailBasedMonitor  - TO: gw77jobs@scigap.org
      2017-09-20 00:16:25,815 [Thread-5] ERROR o.a.a.g.m.e.EmailBasedMonitor  - SUBJECT: Non-zero exit code for job 3231343
      2017-09-20 00:16:41,930 [Thread-5] INFO  o.a.a.g.m.e.EmailBasedMonitor  - [EJM]: 5 job/s in job monitor map
      2017-09-20 00:16:42,167 [Thread-5] INFO  o.a.a.g.m.e.EmailBasedMonitor  - [EJM]: Retrieving unseen emails
      2017-09-20 00:16:42,913 [Thread-5] INFO  o.a.a.g.m.e.EmailBasedMonitor  - [EJM]: 75 new email/s received
      2017-09-20 00:16:49,447 [Thread-5] ERROR o.a.a.g.m.e.p.PBSEmailParser  - [EJM]: No matched found for content => 
      PBS Job Id: 48.torque-server
      Job Name:   A746448754
      Exec host:  compute-1/0-3
      An error has occurred processing your job, see below.
      Post job file processing error; job 48.torque-server on host compute-1Unknown resource type  REJHOST=compute-1 MSG=Root
       cannot open home directory '/home/grid_user' specified, errno=2 (No such file or directory) -- Ignore if root squashin
      g is enabled
      2017-09-20 00:16:49,447 [Thread-5] INFO  o.a.a.g.m.e.EmailBasedMonitor  - Returned null for job id, message subject--> 
      PBS JOB 48.torque-server
      2017-09-20 00:16:49,447 [Thread-5] INFO  o.a.a.g.m.e.EmailBasedMonitor  - Returned null for job name, message subject -
      -> PBS JOB 48.torque-server

      If an error was thrown I think it would have been logged since the EmailBasedMonitor thread catches an logs Throwable.




            Unassigned Unassigned
            marcuschristie Marcus Christie
            0 Vote for this issue
            1 Start watching this issue