Uploaded image for project: 'Beam'
  1. Beam
  2. BEAM-8539

Clearly define the valid job state transitions

Details

    Description

      The Beam job state transitions are ill-defined, which is big problem for anything that relies on the values coming from JobAPI.GetStateStream.

      I was hoping to find something like a state transition diagram in the docs so that I could determine the start state, the terminal states, and the valid transitions, but I could not find this. The code reveals that the SDKs differ on the fundamentals:

      Java InMemoryJobService:

      • start state: STOPPED
      • run - about to submit to executor:  STARTING
      • run - actually running on executor:  RUNNING
      • terminal states: DONE, FAILED, CANCELLED, DRAINED

      Python AbstractJobServiceServicer / LocalJobServicer:

      • start state: STARTING
      • terminal states: DONE, FAILED, CANCELLED, STOPPED

      I think it would be good to make python work like Java, so that there is a difference in state between a job that has been prepared and one that has additionally been run.

      It's hard to tell how far this problem has spread within the various runners.  I think a simple thing that can be done to help standardize behavior is to implement the terminal states as an enum in the beam_job_api.proto, or create a utility function in each language for checking if a state is terminal, so that it's not left up to each runner to reimplement this logic.

       

      Attachments

        Activity

          People

            lcwik Luke Cwik
            chadrik Chad Dombrova
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0h
                0h
                Logged:
                Time Spent - 6.5h
                6.5h