[SQOOP-2920] sqoop performance deteriorates significantly on wide datasets; sqoop 100% on cpu - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Critical
Resolution: Fixed
Affects Version/s: 1.4.5, 1.4.6
Fix Version/s: 1.4.7
Component/s: connectors/oracle, hive-integration, metastore
Labels:
- columns
- hive
- oracle
- perfomance
Environment:
Hide

sqoop export on a very wide dataset (over 700 columns)

sqoop export to oracle

subset of columns is exported (using --columns argument)

parquet files

--table --hcatalog-database --hcatalog-table options are used
Show
sqoop export on a very wide dataset (over 700 columns) sqoop export to oracle subset of columns is exported (using --columns argument) parquet files --table --hcatalog-database --hcatalog-table options are used

Description

We sqoop export from datalake to Oracle quite often.
Every time we sqoop "narrow" datasets, Oracle always have scalability issues (3-node all-flash Oracle RAC) normally can't keep up with more than 45-55 sqoop mappers. Map-reduce framework shows sqoop mappers are not so loaded.

On wide datasets, this picture is quite opposite. Oracle shows 95% of sessions are bored and waiting for new INSERTs. Even when we go over hundred of mappers. Sqoop has serious scalability issues on very wide datasets. (Our company normally has very wide datasets)

For example, on the last sqoop export:
Started ~2.5 hours ago and 95 mappers already accumulated
CPU time spent (ms) 1,065,858,760
(looking at this metric through map-reduce framework stats)

1 million seconds of CPU time.

Or 11219.57 per mapper. Which is roughly 3.11 hours of CPU time per mapper.
So they are 100% on cpu.

Will also attach jstack files.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

jstack.zip
02/May/16 17:55
77 kB
Ruslan Dautkhanov
top - sqoop mappers hog cpu.png
02/May/16 22:17
70 kB
Ruslan Dautkhanov
SQOOP-2920.patch
17/May/16 21:41
11 kB
Attila Szabo

Issue Links

links to

Review ticket

Activity

People

Assignee:: Attila Szabo

Reporter:: Ruslan Dautkhanov

Votes:: 0 Vote for this issue

Watchers:: 8 Start watching this issue

Dates

Created:: 02/May/16 17:52

Updated:: 19/May/16 15:31

Resolved:: 18/May/16 20:28