-
Type:
Improvement
-
Status: In Progress
-
Priority:
Major
-
Resolution: Unresolved
-
Affects Version/s: None
-
Fix Version/s: 2.0.0
-
Component/s: None
-
Labels:None
The current execution engine supported in sqoop is MR.
The goal if this ticket is to support sqoop jobs ( map only and map+reduce ) to run on spark environment.
It should at the minimum support running on the standalone spark cluster and then subsequently work with YARN/mesos.
High level goals
1. Hook up with the connector apis to provide the basic load/ extract to the spark RDD.
2. Implementation of the Sqoop RDD to support extraction from different data sources . The design proposal will discuss the alternatives on how this can be achieved.
3. Optimizing the loading/writing ( re-use/ refactor the consumer thread code to be agnostic of the hadoop output format)