Details
-
Improvement
-
Status: In Progress
-
Major
-
Resolution: Unresolved
-
None
-
None
-
None
Description
The current execution engine supported in sqoop is MR.
The goal if this ticket is to support sqoop jobs ( map only and map+reduce ) to run on spark environment.
It should at the minimum support running on the standalone spark cluster and then subsequently work with YARN/mesos.
High level goals
1. Hook up with the connector apis to provide the basic load/ extract to the spark RDD.
2. Implementation of the Sqoop RDD to support extraction from different data sources . The design proposal will discuss the alternatives on how this can be achieved.
3. Optimizing the loading/writing ( re-use/ refactor the consumer thread code to be agnostic of the hadoop output format)