-
Type:
Improvement
-
Status: Resolved
-
Priority:
Minor
-
Resolution: Fixed
-
Affects Version/s: 3.0.0-alpha1
-
Fix Version/s: 2.8.0, 3.0.0-alpha1
-
Component/s: tools/distcp
-
Labels:None
-
Target Version/s:
Distcp was taking long time in copyListing.buildListing() for large source trees (I was using source of 1.5M files in a tree of about 50K directories). For input at s3 buildListing was taking more than one hour. I've noticed a performance bug in the current code which does listStatus twice for each directory which doubles number of RPCs in some cases (if most directories do not contain >1000 files).