Uploaded image for project: 'Samza'
  1. Samza
  2. SAMZA-2783

Re-factor DirDiffUtil.getDirDiff to avoid repeated calls to areSameFile

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • 1.4
    • None
    • None
    • None

    Description

      While profiling a Samza job it was noticed that, for this given job, ~38% of the time was spent in org.apache.samza.storage.blobstore.util.DirDiffUtil.getDirDiff, with the primary contributor being areSameFile.

       

      Looking at the code it has the following comment:

      DirDiffUtil.java:271

        // TODO MED shesharm: this compares each file in directory 3 times. Categorize files in one traversal instead.

       

      Re-factor DirDiffUtil.getDirDiff to loop through all names once, reducing the number of calls to areSameFile.test.

      Attachments

        Activity

          People

            Unassigned Unassigned
            asautins Andy Sautins
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: