Uploaded image for project: 'REEF (Retired)'
  1. REEF (Retired)
  2. REEF-1691

Should not request extra evaluators if evaluator failed at WatingForEvaluator state

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 0.16
    • None

    Description

      When Evaluators fail at both WatingForEvalautor state and TaskRunningState, in recovery, we use _failedEvaluatorsCount to request new Evaluators. That number includes the failed Evaluators in both states, while we have requested the new Evaluators for failed Evaluators at WatingForEvalautor state. This causes additional Evaluators are requested. It is a regression caused by REEF-1677.

      With REEF-1688, even we loose the condition to ignore the additional Evaluators added, the additional allocated Evaluator can be received in other state because we change the system state right after we got all the Evaluators needed. When we receive additional Allocated Evaluators in other unexpected state, it will result in IMRUSystemException.

      The fix is to only request Evaluators failed during/after task submitting in recovery.

      Attachments

        Issue Links

          Activity

            People

              juliaw Julia Wang
              juliaw Julia Wang
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: