[SQOOP-1532] Sqoop2: Support Sqoop on Spark Execution Engine - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: In Progress
Priority: Major
Resolution: Unresolved
Affects Version/s: None
Fix Version/s: 2.0.0
Component/s: None
Labels:
None

Description

The current execution engine supported in sqoop is MR.

The goal if this ticket is to support sqoop jobs ( map only and map+reduce ) to run on spark environment.

It should at the minimum support running on the standalone spark cluster and then subsequently work with YARN/mesos.

High level goals
1. Hook up with the connector apis to provide the basic load/ extract to the spark RDD.
2. Implementation of the Sqoop RDD to support extraction from different data sources . The design proposal will discuss the alternatives on how this can be achieved.
3. Optimizing the loading/writing ( re-use/ refactor the consumer thread code to be agnostic of the hadoop output format)

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

SQOOP-1532.patch
15/Dec/15 15:46
825 kB
Enrique Ruiz Garcia

Sub-Tasks

1.	Sqoop RDD semantics		Resolved	Veena Basavaraj
2.	Add support to store the Spark accumulators for submission counters.		Resolved	Veena Basavaraj

Activity

People

Assignee:: Veena Basavaraj

Reporter:: Veena Basavaraj

Votes:: 6 Vote for this issue

Watchers:: 13 Start watching this issue

Dates

Created:: 18/Sep/14 21:48

Updated:: 15/Dec/15 15:46