Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
-
None
-
Reviewed
Description
CopyFromLocal/Put is not currently multithreaded.
In case, where there are multiple files which need to be uploaded to the hdfs, a single thread reads the file and then copies the data to the cluster.
This copy to hdfs can be made faster by uploading multiple files in parallel.
I am attaching the initial patch so that I can get some initial feedback.
Attachments
Attachments
Issue Links
- breaks
-
HADOOP-14752 TestCopyFromLocal#testCopyFromLocalWithThreads is fleaky
- Resolved
- is related to
-
HADOOP-14698 Make copyFromLocal's -t option available for put as well
- Resolved