Details
-
New Feature
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
-
None
Description
Currently, we have assumed all the data exists at the Master, but it's not much scalable. Instead, we can consider data loading from external storage, and target HDFS as the first implementation. Also, as the data loading cost is expensive usually, it'd be good to cache the loaded data, which can be integrated with REEF-1100.
FYI. REEF has data loading already, but it's less suitable for Vortex because it includes more than we need (e.g., resource allocation). We can reuse some classes such as DataSet though.