Description
In many applications (e.g., Logistical Regression), the amount of data sent over network between tasklets and Vortex master is proportional to the number of tasklets. Such an I/O cost could potentially become a performance and/or scalability bottleneck for distributing a computation to run across many machines.
Similar to the cases in MapReduce, results from tasklets can often be combined or aggregated locally before sending back to reduce the total amount of I/O. In this (sub-)task, we explore the API and runtime enhancement for enabling such local aggregation whenever possible.
Attachments
Issue Links
- incorporates
-
REEF-1075 Create Tasklet abstraction to enable one-to-many mapping of "Future-like" object to TaskletIds
- Resolved
-
REEF-1077 Allow VortexWorker reports to report about more than a single Tasklet
- Resolved
-
REEF-1084 Modify TaskletReport to allow aggregation of Tasklets for a single result/error
- Resolved
-
REEF-1129 Driver side VortexAggregate submission
- Resolved
-
REEF-1130 Worker side VortexAggregateFunction reception
- Resolved
-
REEF-1131 Create, define, and apply VortexAggregationPolicy
- Resolved
-
REEF-1132 Define VortexAggregateFuture and VortexAggregateFunction
- Resolved