Uploaded image for project: 'Crunch (Retired)'
  1. Crunch (Retired)
  2. CRUNCH-256

SequentialFileNamingScheme should cache the # of files in the target directory after the first read

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 0.7.0
    • 0.8.0
    • None
    • None

    Description

      After a job finishes running, the post-job hooks rename the files from a temp output directory to the target output directory. When we have lots of files, this move can take a long time, and I traced the performance issue to the fact that SequentialFileNamingScheme does a listStatus() on the output directory for every file that gets moved. If SequentialFileNamingScheme just does this check once and then increments an internal counter, we can significantly decrease the performance overhead involved with the move.

      Attachments

        1. CRUNCH-256.patch
          2 kB
          Josh Wills
        2. CRUNCH-256b.patch
          14 kB
          Josh Wills

        Activity

          People

            jwills Josh Wills
            jwills Josh Wills
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: