Uploaded image for project: 'Airavata'
  1. Airavata
  2. AIRAVATA-2742

Helix Controller throws an Exception when the participant is killed

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 0.18
    • None
    • helix implementation
    • None

    Description

      This was a sporadic issue and occurred only once in the test setup. There were 5 - 10 tasks running in the Participant and Participant was externally killed by SIGTERM command (kill <process-id>. Once the Participant is started again, it did not pickup the tasks that it was running at the time it was killed. Surprisingly, the status of the respective workflows were IN_PROGRESS status. Helix Controller log showed following error for each Workflow. This seems like a bug in Helix and I posted the issue in Helix mailing list (Subject : Sporadic issue when restarting a Participant). 

       
      2018-04-06 15:10:57,766 [Thread-3] ERROR o.a.h.c.s.BestPossibleStateCalcStage  - Error computing assignment for resource Workflow_of_process_PROCESS_7f6c8a54-b50f-4bdb-aafd-59ce87276527-POST-b5e39e07-2d8e-4309-be5a-f5b6067f9a24_TASK_cc8039e5-f054-4dea-8c7f-07c98077b117. Skipping.
      java.lang.NullPointerException: Name is null
              at java.lang.Enum.valueOf(Enum.java:236)
              at org.apache.helix.task.TaskPartitionState.valueOf(TaskPartitionState.java:25)
              at org.apache.helix.task.JobRebalancer.computeResourceMapping(JobRebalancer.java:272)
              at org.apache.helix.task.JobRebalancer.computeBestPossiblePartitionState(JobRebalancer.java:140)
              at org.apache.helix.controller.stages.BestPossibleStateCalcStage.compute(BestPossibleStateCalcStage.java:171)
              at org.apache.helix.controller.stages.BestPossibleStateCalcStage.process(BestPossibleStateCalcStage.java:66)
              at org.apache.helix.controller.pipeline.Pipeline.handle(Pipeline.java:48)
              at org.apache.helix.controller.GenericHelixController.handleEvent(GenericHelixController.java:295)
              at org.apache.helix.controller.GenericHelixController$ClusterEventProcessor.run(GenericHelixController.java:595)
      2018-04-06 15:11:00,385 [Thread-3] ERROR o.a.h.c.s.BestPossibleStateCalcStage  - Error computing assignment for resource Workflow_of_process_PROCESS_2b69b499-c527-4c9d-8b2b-db17366f5f81-POST-c67607ae-9177-4a02-af8a-8b3751eea4ff_TASK_1ea6876d-f2ec-4139-a15d-0e64a80a3025. Skipping. 
       

      Attachments

        Activity

          People

            dimuthuupe Dimuthu
            dimuthuupe Dimuthu
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: