Details
-
New Feature
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
-
None
Description
It is very useful to have a public dataset loader to fetch ML datasets from popular repos, e.g., libsvm and UCI. This JIRA is to discuss the design, requirements, and initial implementation.
val loader = new DatasetLoader(sqlContext) val df = loader.get("libsvm", "rcv1_train.binary")
User should be able to list (or preview) datasets, e.g.
val datasets = loader.ls("libsvm") // returns a local DataFrame datasets.show() // list all datasets under libsvm repo
It would be nice to allow 3rd-party packages to register new repos. Both the API and implementation are pending discussion. Note that this requires http and https support.