Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-15548

Randomize local dirs

VotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • None
    • 2.10.0, 3.2.0, 3.1.1, 2.9.2, 2.8.5, 3.0.4
    • None
    • None

    Description

      shuffle LOCAL_DIRS, LOG_DIRS and LOCAL_USER_DIRS when launching container. Some applications will process these in exactly the same way in every container (e.g. roundrobin) which can cause disks to get unnecessarily overloaded (e.g. one output file written to first entry specified in the environment variable).

      There are two paths for local dir allocation, depending on whether the size is unknown or known.  The unknown path already uses a random algorithm.  The known path initializes with a random starting point, and then goes round-robin after that.  When selecting a dir, it increments the last used by one and then checks sequentially until it finds a dir that satisfies the request.  Proposal is to increment by a random value of between 1 and num_dirs - 1, and then check sequentially from there.  This should result in a more random selection in all cases.

      Attachments

        1. HADOOP-15548.001.patch
          3 kB
          Jim Brennan
        2. HADOOP-15548.002.patch
          4 kB
          Jim Brennan
        3. HADOOP-15548-branch-2.001.patch
          4 kB
          Jim Brennan

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            jbrennan Jim Brennan
            jbrennan Jim Brennan
            Votes:
            0 Vote for this issue
            Watchers:
            8 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment