[HADOOP-14766] Cloudup: an object store high performance dfs put command - ASF JIRA

XML

Word

Printable

JSON

Details

Type: New Feature
Status: Resolved
Priority: Minor
Resolution: Won't Fix
Affects Version/s: 2.8.1
Fix Version/s: None
Component/s: fs, fs/azure, fs/s3
Labels:
None

Target Version/s:

3.4.0

Description

hdfs put local s3a://path is suboptimal as it treewalks down down the source tree then, sequentially, copies up the file through copying the file (opened as a stream) contents to a buffer, writes that to the dest file, repeats.

For S3A that hurts because

it;s doing the upload inefficiently: the file can be uploaded just by handling the pathname to the AWS xter manager
it is doing it sequentially, when some parallelised upload would work.
as the ordering of the files to upload is a recursive treewalk, it doesn't spread the upload across multiple shards.

Better:

build the list of files to upload
upload in parallel, picking entries from the list at random and spreading across a pool of uploaders
upload straight from local file (copyFromLocalFile()
track IO load (files created/second) to estimate risk of throttling.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

HADOOP-14766-002.patch
07/Nov/17 13:55
32 kB
Steve Loughran
HADOOP-14766-001.patch
06/Nov/17 19:05
36 kB
Steve Loughran

Issue Links

depends upon

HADOOP-14432 S3A copyFromLocalFile to be robust, tested

Resolved

is related to

HADOOP-14767 WASB to implement copyFromLocalFile()

Resolved

Activity

People

Assignee:: Steve Loughran

Reporter:: Steve Loughran

Votes:: 0 Vote for this issue

Watchers:: 9 Start watching this issue

Dates

Created:: 12/Aug/17 17:05

Updated:: 21/Jan/21 11:53

Resolved:: 21/Jan/21 11:53