Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
2.7.0, 2.7.1
-
None
Description
For very large source trees on s3 distcp is taking long time to build file listing (client code, before starting mappers). For a dataset I used (1.5M files, 50K dirs) it was taking 65 minutes before my fix in HADOOP-11785 and 36 minutes after the fix).
Attachments
Attachments
Issue Links
- breaks
-
HADOOP-12087 [JDK8] Fix javadoc errors caused by incorrect or illegal tags
- Resolved
-
HDFS-9612 DistCp worker threads are not terminated after jobs are done.
- Resolved
- is depended upon by
-
HADOOP-11694 Über-jira: S3a phase II: robustness, scale and performance
- Resolved
- relates to
-
HADOOP-17531 DistCp: Reduce memory usage on copying huge directories
- Resolved