Details
-
New Feature
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
Description
Currently, Hyracks only accepts jobs in the form of DAGs (i.e. no cycles) of operators. To implement any form of recursion, an "external manager" (e.g. Pregelix) must repeatedly issue Hyracks jobs until a fixed-point is reached (i.e. there are no tuples left to process). This idea scales well for implementing custom graph algorithms, but not for operational workloads that require recursion on a significantly smaller portion of a graph due to (a) the overhead of starting and stopping a Hyracks job and (b) the larger-than-necessary (in many cases) idleness of many workers, and c) the need to synchronize for each iteration. Ideally, it would be nice to have some sort of recursion built into Hyracks itself.
Unfortunately, there is a tradeoff between generalizability and efficiency here. A great first step (and one that is needed for the Graphix project) is to handle some common operations like as reachability / shortest-path in a tuple-pipelineable and partitioned-parallel manner. These operations, instead of having an "external manager" handle the recursion in synchrony across workers, would be handled by Hyracks itself. Hyracks users that want to create a directed job graph with loops would then just submit a Hyracks job with a few special properties (e.g. having an operator in the job to detect when there are no more tuples in the loop).
Later down the line, if someone else decides to implement an extended version of SQL++ that has recursion, they can leverage these operators to answer the "are we at a fixed-point" question for the directed job graphs they'll generate.