[SQOOP-1601] Sqoop2: To part of the Connector API to support balancing/ re-partioning step - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: None
Fix Version/s: 2.0.0
Component/s: None
Labels:
None

Description

Today the job lifecycle of the SQOOP looks like this.

to recap:
Step 1 : Intializers for the sources both from/ to
Step 2 : Partitioner ( for the data from the FROM data source )
Step 3 : Extractor ( actual reading from the FROM data source)
Step 4: Loader ( for the TO datasource, i.e writing data to)
Step 5: Destroyer for both the sources

Both Extractors and Loaders are parallelized in themselves, so we can say the numExtractors and numLoaders to use via the driver config.

But in cases when there is imbalance between the extractors and loaders, we may need a intermediate step to rebalance/ repartition or shuffle as the writing is happening in the Loaders. Today we do not support this step, might be good to provide another step that may be relevant for some connectors to add for better control on the load step.

Whether this step can be generic one that can operate/ transform the output as it is written to the TO data source, we should discuss that in addition.

Attachments

Issue Links

relates to

SQOOP-1603 Sqoop2: Explicit support for Merge in the Sqoop Job lifecyle in the MR engine

Open

Activity

People

Assignee:: Unassigned

Reporter:: Veena Basavaraj

Votes:: 1 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 22/Oct/14 16:50

Updated:: 13/Jan/15 22:13