Uploaded image for project: 'Apache Gobblin'
  1. Apache Gobblin
  2. GOBBLIN-1721

Give option to cancel helix workflow through Delete API to avoid job hanging

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • None
    • gobblin-cluster
    • None

    Description

      Currently when we receive a job restart(handleUpdateJobConfigArrival), GobblinHelixJobLauncher will firstly callĀ  helixTaskDriver.waitToStop to stop the workflow, then initiate the new one. We observe the behavior of Helix taking exceptionally long to stop the workflow, making the job state staying in STOPPING status. This will make waitToStop timeout and throw exception all the time, making the new flow never be able to launch.

      We can utilize Delete API in this case since our job is stateless for Helix, to avoid job hanging.

      Attachments

        Activity

          People

            hutran Hung Tran
            hanghangliu Hanghang Liu
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0h
                0h
                Logged:
                Time Spent - 1h 10m
                1h 10m