Uploaded image for project: 'OFBiz'
  1. OFBiz
  2. OFBIZ-6784

JobSandbox : reload crashed job maybe duplicate pending service

    Details

      Description

      When the JobPoller run reloadCrashedJobs() after an OFBiz restart, if you have a large service that crash that already replenish the pool receive a new run instant for it.

      Example: If you have a service like loadExternalOrder that run each hours. You stop your OFBiz during their activity and at restart you have :

      • job1 loadExternalOrder RUNINNG
      • job2 loadExternalOrder PENDING at t+1h (normal schedule)
      • job1.1 loadExternalOrder PENDING at t+1h (crashed schedule)

      I propose to exclude from the process reloadCrashedJobs() all jobs that have already a new scheduled instance

        Activity

        Hide
        soledad Nicolas Malin added a comment -

        The patch with my ideao to solve the problem.

        We don't reload a crashed service if a child instance is already present on status PENDING, RUNNING or QUEUED

        Show
        soledad Nicolas Malin added a comment - The patch with my ideao to solve the problem. We don't reload a crashed service if a child instance is already present on status PENDING, RUNNING or QUEUED
        Hide
        soledad Nicolas Malin added a comment -

        I have the problem on production site under 13.07 so I will interested to backport this on all stables branches.

        I open to a review

        Show
        soledad Nicolas Malin added a comment - I have the problem on production site under 13.07 so I will interested to backport this on all stables branches. I open to a review
        Hide
        lektran Scott Gray added a comment -

        The new job created by the crashed job is set to run immediately, not after one hour.

        I disagree with this change. What if the job runs weekly/monthly/yearly instead of hourly? I think it is better to run a recurring job too often than not often enough.

        Show
        lektran Scott Gray added a comment - The new job created by the crashed job is set to run immediately, not after one hour. I disagree with this change. What if the job runs weekly/monthly/yearly instead of hourly? I think it is better to run a recurring job too often than not often enough.
        Hide
        soledad Nicolas Malin added a comment -

        Thanks scott to sharing.

        The new job created by the crashed job is set to run immediately, not after one hour.

        Right the job run immediately, but the job is also planned one hour later. And it's the problem because after each job continue their life and the next crash it's not one job that will be replay but two that will create four jobs ...

        What if the job runs weekly/monthly/yearly instead of hourly?

        good spot, I focused only on job to highest incidence.
        So I will check a better solution on the replay process. It's important to ensure that one job replay only one job.

        Show
        soledad Nicolas Malin added a comment - Thanks scott to sharing. The new job created by the crashed job is set to run immediately, not after one hour. Right the job run immediately, but the job is also planned one hour later. And it's the problem because after each job continue their life and the next crash it's not one job that will be replay but two that will create four jobs ... What if the job runs weekly/monthly/yearly instead of hourly? good spot, I focused only on job to highest incidence. So I will check a better solution on the replay process. It's important to ensure that one job replay only one job.
        Hide
        lektran Scott Gray added a comment -

        Right the job run immediately, but the job is also planned one hour later.

        Weird, I thought this had been fixed already. reloadCrashedJobs shouldn't store recurrence/temporal info on the new job if the crashed job's status is SERVICE_RUNNING, because the init() of the crashed job will have already scheduled it before the server went down. It's just a matter of setting tempExprId and recurrenceInfoId to null before storing the new job.

        Show
        lektran Scott Gray added a comment - Right the job run immediately, but the job is also planned one hour later. Weird, I thought this had been fixed already. reloadCrashedJobs shouldn't store recurrence/temporal info on the new job if the crashed job's status is SERVICE_RUNNING, because the init() of the crashed job will have already scheduled it before the server went down. It's just a matter of setting tempExprId and recurrenceInfoId to null before storing the new job.
        Hide
        soledad Nicolas Malin added a comment -

        Thanks a lot Scott, you give the right correction, I will correct it.

        Show
        soledad Nicolas Malin added a comment - Thanks a lot Scott, you give the right correction, I will correct it.
        Hide
        soledad Nicolas Malin added a comment -

        Ok done on revision :

        • trunk : 1722816
        • 15.12 : 1722819
        • 14.12 : 1722817
        • 13.07 : 1722820
        • 12.04 : 1722821

        Thanks Scott for your solution

        Show
        soledad Nicolas Malin added a comment - Ok done on revision : trunk : 1722816 15.12 : 1722819 14.12 : 1722817 13.07 : 1722820 12.04 : 1722821 Thanks Scott for your solution

          People

          • Assignee:
            soledad Nicolas Malin
            Reporter:
            soledad Nicolas Malin
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development