Details
-
New Feature
-
Status: Closed
-
Major
-
Resolution: Fixed
-
trunk
-
None
Description
All dataset instances specified as input to coordinator, currently work on AND logic i.e. ALL of them should be available for workflow to start. We should enhance this to include more logical ways of specifying availability criteria e.g.
- OR between instances
- minimum N out of K instances
- delta datasets (process data incrementally)
Use-cases for this:
- Different datasets are BCP, and workflow can run with either, whichever arrives earlier.
- Data is not guaranteed, and while $coord:latest allows skipping to available ones, workflow will never trigger unless mentioned number of instances are found.
- Workflow is like a ‘refining’ algorithm which should run after minimum required datasets are ready, and should only process the delta for efficiency.
This JIRA is to discuss the design and then the review the implementation for some or all of the above features.
Attachments
Attachments
Issue Links
- breaks
-
OOZIE-3031 Coord job with only unresolved dependencies doesn't timeout
- Closed
- links to