[SYSTEMDS-2418] Spark data partitioner - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Sub-task
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: SystemML 1.2
Component/s: None
Labels:
None

Description

In the context of ml, it would be more efficient to support the data partitioning in distributed manner. This task aims to do the data partitioning on Spark which means that all the data will be firstly splitted among workers and then execute data partitioning on worker side according to scheme, and then the partitioned data which stay on each worker could be directly passed to run model training work without materialization on HDFS.

Attachments

Issue Links

Is contained by

SYSTEMDS-2087 Initial version of distributed spark backend

Resolved

Activity

People

Assignee:: LI Guobao

Reporter:: LI Guobao

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 24/Jun/18 18:50

Updated:: 13/Jul/18 10:42

Resolved:: 13/Jul/18 10:42