Uploaded image for project: 'Apache Gobblin'
  1. Apache Gobblin
  2. GOBBLIN-1982

Show a consistent flowExecutionId btwn Compilation & Execution

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • gobblin-service
    • None

    Description

      The problem statement addressed in this issue is to determine a unique ID per execution that is agreed upon by all hosts, computed before returning any information back to user (about compilation or execution).

      Upon receiving the request for an adhoc flow, the recipient host creates a flowExecutionId when initializing FlowSpec from config for non-scheduled flows (see code). This flowExecutionId is returned to the user for tracking the flow status. This should not change later on.

      Scheduled flows are fired upon each host at a different system clock time, so those ones need a consensus mechanism to coordinate between hosts. During multiActiveLeaseArbitration we update the flowExecutionId of a DagAction with an agreed upon value from the database to gain this consistency. However, this should only be done for scheduled flows before we any information externally about the flowExecutionId until later.

      To address the problems above we 

      1) skip flowExecutionId replacement for adhoc flows

      2) remove a flow compilation and GTE emission before the consensus on flowExecutionId is removed.

      There's no significant impact of removing this check. It will result in dagActions created for flows that may fail compilation later (after lease arbitration and before execution). Since we already compile the flow on accepting it, we are okay with a slight delay in failing a flow. 

      Attachments

        Activity

          People

            abti Abhishek Tiwari
            umustafi Urmi Mustafi
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:

              Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0h
                0h
                Logged:
                Time Spent - 2h
                2h