Details
-
Improvement
-
Status: Resolved
-
Minor
-
Resolution: Fixed
-
1.4
-
None
-
None
-
None
Description
While profiling a Samza job it was noticed that, for this given job, ~38% of the time was spent in org.apache.samza.storage.blobstore.util.DirDiffUtil.getDirDiff, with the primary contributor being areSameFile.
Looking at the code it has the following comment:
DirDiffUtil.java:271
// TODO MED shesharm: this compares each file in directory 3 times. Categorize files in one traversal instead.
Re-factor DirDiffUtil.getDirDiff to loop through all names once, reducing the number of calls to areSameFile.test.