Hadoop HDFS
  1. Hadoop HDFS
  2. HDFS-1804

Add a new block-volume device choosing policy that looks at free space

    Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: 2.0.4-alpha
    • Fix Version/s: 2.1.0-beta
    • Component/s: datanode
    • Labels:
      None
    • Hadoop Flags:
      Reviewed
    • Release Note:
      Hide
      There is now a new option to have the DN take into account available disk space on each volume when choosing where to place a replica when performing an HDFS write. This can be enabled by setting the config "dfs.datanode.fsdataset.volume.choosing.policy" to the value "org.apache.hadoop.hdfs.server.datanode.fsdataset.AvailableSpaceVolumeChoosingPolicy".
      Show
      There is now a new option to have the DN take into account available disk space on each volume when choosing where to place a replica when performing an HDFS write. This can be enabled by setting the config "dfs.datanode.fsdataset.volume.choosing.policy" to the value "org.apache.hadoop.hdfs.server.datanode.fsdataset.AvailableSpaceVolumeChoosingPolicy".
    • Target Version/s:

      Description

      HDFS-1120 introduced pluggable block-volume choosing policies, but still carries the vanilla RoundRobin as its default.

      An additional implementation that also takes into consideration the free space remaining on the disk (or other params) should be a good addition as an alternative to vanilla RR.

      1. HDFS-1804.patch
        28 kB
        Aaron T. Myers
      2. HDFS-1804.patch
        29 kB
        Aaron T. Myers
      3. HDFS-1804.patch
        31 kB
        Aaron T. Myers
      4. HDFS-1804.patch
        31 kB
        Aaron T. Myers
      5. HDFS-1804.patch
        31 kB
        Aaron T. Myers

        Issue Links

          Activity

          Harsh J created issue -
          Hide
          Harsh J added a comment -

          Would using a k-smallest algorithm on the volumes list, along with a round robin after that be a good way to do this?

          Show
          Harsh J added a comment - Would using a k-smallest algorithm on the volumes list, along with a round robin after that be a good way to do this?
          Hide
          Todd Lipcon added a comment -

          One way might be to simply choose each disk with probability relative to the amount of available space on it. So, if one disk has 2T free and another has 1T free, choose the 2T drive 2/3 of the time. Only downside is that, if two disks have an equal amount of free space, it would be better to degenerate to round-robin rather than random selection.

          Show
          Todd Lipcon added a comment - One way might be to simply choose each disk with probability relative to the amount of available space on it. So, if one disk has 2T free and another has 1T free, choose the 2T drive 2/3 of the time. Only downside is that, if two disks have an equal amount of free space, it would be better to degenerate to round-robin rather than random selection.
          Hide
          Doug Cutting added a comment -

          Maybe a biased round robin? E.g., you might track the percentage of times that each had been allocated from and allocate from the node whose percentage is farthest from it's desired probability.

          Show
          Doug Cutting added a comment - Maybe a biased round robin? E.g., you might track the percentage of times that each had been allocated from and allocate from the node whose percentage is farthest from it's desired probability.
          Hide
          dhruba borthakur added a comment -

          Hi doug, I think just a biased round-robin (without respect to available free space) could be problematic, isn't it?

          Show
          dhruba borthakur added a comment - Hi doug, I think just a biased round-robin (without respect to available free space) could be problematic, isn't it?
          Hide
          Doug Cutting added a comment -

          Dhruba, I meant biased by available free space. Randomly biased by free space would, as Todd suggests, not spread load as evenly between devices as would round-robin. So, if disk A has twice the free space of disks B and C then allocations should ideally go

          {A,B,A,C,A,B,A,C...}

          rather than a stateless probabilistic allocation which would often send repeated allocations to A, impacting throughput.

          Show
          Doug Cutting added a comment - Dhruba, I meant biased by available free space. Randomly biased by free space would, as Todd suggests, not spread load as evenly between devices as would round-robin. So, if disk A has twice the free space of disks B and C then allocations should ideally go {A,B,A,C,A,B,A,C...} rather than a stateless probabilistic allocation which would often send repeated allocations to A, impacting throughput.
          Aaron T. Myers made changes -
          Field Original Value New Value
          Labels newbie
          Harsh J made changes -
          Link This issue relates to HDFS-1120 [ HDFS-1120 ]
          Arun C Murthy made changes -
          Fix Version/s 0.24.0 [ 12317653 ]
          Fix Version/s 0.23.0 [ 12315571 ]
          Harsh J made changes -
          Link This issue relates to HDFS-1312 [ HDFS-1312 ]
          Harsh J made changes -
          Fix Version/s 0.24.0 [ 12317653 ]
          Hide
          Aaron T. Myers added a comment -

          Here's a patch which addresses the issue by adding a new volume choose policy called "AvailableSpaceVolumeChoosingPolicy".

          This policy works by first determining if all the free space of all the volumes are within some configurable range, by default 10GB. If they are balanced in this way, then assignments are made in a strictly round robin fashion. If the available free space is not balanced across all available volumes, the volumes are bucketed as either having a lot or a little free space. We then choose to allocate a block to one of these buckets of volumes randomly with a configurable frequency, and within one of these two buckets we allocate blocks on a round robin basis.

          This scheme allows administrators to control both their threshold for what they consider "balanced" disks and how much they're willing to impact overall concurrent write throughput to the node vs. their desire to get volumes quickly balanced again.

          In addition to the unit tests in the patch, I also manually tested this on a single-node cluster with 4 DN volumes. It worked as expected from a correctness point of view. From a performance point of view, there was no discernible performance impact both when all volumes were considered balanced, or in the case of imbalanced volumes but in the absence of concurrent writes.

          Show
          Aaron T. Myers added a comment - Here's a patch which addresses the issue by adding a new volume choose policy called "AvailableSpaceVolumeChoosingPolicy". This policy works by first determining if all the free space of all the volumes are within some configurable range, by default 10GB. If they are balanced in this way, then assignments are made in a strictly round robin fashion. If the available free space is not balanced across all available volumes, the volumes are bucketed as either having a lot or a little free space. We then choose to allocate a block to one of these buckets of volumes randomly with a configurable frequency, and within one of these two buckets we allocate blocks on a round robin basis. This scheme allows administrators to control both their threshold for what they consider "balanced" disks and how much they're willing to impact overall concurrent write throughput to the node vs. their desire to get volumes quickly balanced again. In addition to the unit tests in the patch, I also manually tested this on a single-node cluster with 4 DN volumes. It worked as expected from a correctness point of view. From a performance point of view, there was no discernible performance impact both when all volumes were considered balanced, or in the case of imbalanced volumes but in the absence of concurrent writes.
          Aaron T. Myers made changes -
          Attachment HDFS-1804.patch [ 12576882 ]
          Aaron T. Myers made changes -
          Assignee Aaron T. Myers [ atm ]
          Aaron T. Myers made changes -
          Labels newbie
          Affects Version/s 2.0.4-alpha [ 12324136 ]
          Target Version/s 2.0.5-beta [ 12324031 ]
          Aaron T. Myers made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12576882/HDFS-1804.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 3 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          -1 findbugs. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings.

          -1 release audit. The applied patch generated 1 release audit warnings.

          +1 core tests. The patch passed unit tests in hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/4184//testReport/
          Release audit warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/4184//artifact/trunk/patchprocess/patchReleaseAuditProblems.txt
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/4184//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html
          Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/4184//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12576882/HDFS-1804.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 3 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. -1 findbugs . The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings. -1 release audit . The applied patch generated 1 release audit warnings. +1 core tests . The patch passed unit tests in hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/4184//testReport/ Release audit warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/4184//artifact/trunk/patchprocess/patchReleaseAuditProblems.txt Findbugs warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/4184//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/4184//console This message is automatically generated.
          Hide
          Aaron T. Myers added a comment -

          Updated patch to fix the RAT and findbugs warnings.

          Show
          Aaron T. Myers added a comment - Updated patch to fix the RAT and findbugs warnings.
          Aaron T. Myers made changes -
          Attachment HDFS-1804.patch [ 12576940 ]
          Hide
          Harsh J added a comment -

          Hi Aaron,

          Nice patch. I have some mostly superficial comments:

          • We call the following on every choosing call. While I do not imagine it to be overtly expensive, would it make sense to only do these calls every N times or for after every X total replica size bytes alone (X could be related to the space config we provide for this)?
          +  private boolean areAllVolumesWithinFreeSpaceThreshold(final List<V> volumes)
          
          • We're using ReflectionUtils with a conf object for initializing the chosen interface's implementation, so the below change to the interface (and associated changes) could perhaps be avoided if your implementation implements Configurable or extends Configured, and overrides setConf(…)?
          +  
          +  @Override
          +  public void initialize(Configuration conf) {
          +    // Nothing to initialize.
          +  }
          
          • Config names all around are getting longer and longer How 'bout we rename this to be more implementation specific and prefix it dfs.datanode.available.space.volume.choosing.policy.balanced-space- instead of dfs.datanode.fsdataset.volume.choosing.balanced-space-? Just a nit, but I think it then looks more specifically applying to a specific policy.
          • Why would one not want to have this set as default? If just for initial stability reasons (I think the tests are good), can we have a JIRA to toggle it as default in future?

          Thoughts?

          Show
          Harsh J added a comment - Hi Aaron, Nice patch. I have some mostly superficial comments: We call the following on every choosing call. While I do not imagine it to be overtly expensive, would it make sense to only do these calls every N times or for after every X total replica size bytes alone (X could be related to the space config we provide for this)? + private boolean areAllVolumesWithinFreeSpaceThreshold( final List<V> volumes) We're using ReflectionUtils with a conf object for initializing the chosen interface's implementation, so the below change to the interface (and associated changes) could perhaps be avoided if your implementation implements Configurable or extends Configured, and overrides setConf(…)? + + @Override + public void initialize(Configuration conf) { + // Nothing to initialize. + } Config names all around are getting longer and longer How 'bout we rename this to be more implementation specific and prefix it dfs.datanode.available.space.volume.choosing.policy.balanced-space- instead of dfs.datanode.fsdataset.volume.choosing.balanced-space- ? Just a nit, but I think it then looks more specifically applying to a specific policy. Why would one not want to have this set as default? If just for initial stability reasons (I think the tests are good), can we have a JIRA to toggle it as default in future? Thoughts?
          Hide
          Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12576940/HDFS-1804.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 3 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/4185//testReport/
          Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/4185//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - +1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12576940/HDFS-1804.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 3 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/4185//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/4185//console This message is automatically generated.
          Hide
          Aaron T. Myers added a comment -

          Thanks a lot for the review, Harsh. Responses to your review inline.

          We call the following on every choosing call. While I do not imagine it to be overtly expensive, would it make sense to only do these calls every N times or for after every X total replica size bytes alone (X could be related to the space config we provide for this)?

          I doubt seriously this will be a problem at all in practice, so let's hold off on doing so until it does. It wouldn't be tough to do, but I don't think the added code complexity is warranted.

          We're using ReflectionUtils with a conf object for initializing the chosen interface's implementation, so the below change to the interface (and associated changes) could perhaps be avoided if your implementation implements Configurable or extends Configured, and overrides setConf(...)?

          Real good thinking. Done.

          Config names all around are getting longer and longer How 'bout we rename this to be more implementation specific and prefix it dfs.datanode.available.space.volume.choosing.policy.balanced-space- instead of dfs.datanode.fsdataset.volume.choosing.balanced-space-? Just a nit, but I think it then looks more specifically applying to a specific policy.

          Good idea. Done.

          Why would one not want to have this set as default? If just for initial stability reasons (I think the tests are good), can we have a JIRA to toggle it as default in future?

          I thought about changing it to be the default. I didn't for two reasons:

          1. As you suggest, I think it's a good idea to let this bake a bit before we make it the default all around.
          2. It's conceivable that one would not want this behavior, since in cases where many concurrent block writes happen on a single node, this policy will skew writes toward disks with more available free space, potentially impacting write throughput.

          This latest patch also fixes two bugs I discovered during more manual testing:

          1. There was a rather narrow race if the free space on a given volume changed during the course of making the volume selection. This is addressed by getting the available space for each volume once upfront and then using that for the duration of the process of choosing.
          2. There was an issue wherein if more volumes had high available free space than low available free space, the ones with low available free space would actually be allocated more writes per-volume than the others. This is fixed by properly scaling the configured preference percent to account for the size of the two buckets. I also added some more unit tests to verify this is now handled properly.
          Show
          Aaron T. Myers added a comment - Thanks a lot for the review, Harsh. Responses to your review inline. We call the following on every choosing call. While I do not imagine it to be overtly expensive, would it make sense to only do these calls every N times or for after every X total replica size bytes alone (X could be related to the space config we provide for this)? I doubt seriously this will be a problem at all in practice, so let's hold off on doing so until it does. It wouldn't be tough to do, but I don't think the added code complexity is warranted. We're using ReflectionUtils with a conf object for initializing the chosen interface's implementation, so the below change to the interface (and associated changes) could perhaps be avoided if your implementation implements Configurable or extends Configured, and overrides setConf(...)? Real good thinking. Done. Config names all around are getting longer and longer How 'bout we rename this to be more implementation specific and prefix it dfs.datanode.available.space.volume.choosing.policy.balanced-space- instead of dfs.datanode.fsdataset.volume.choosing.balanced-space-? Just a nit, but I think it then looks more specifically applying to a specific policy. Good idea. Done. Why would one not want to have this set as default? If just for initial stability reasons (I think the tests are good), can we have a JIRA to toggle it as default in future? I thought about changing it to be the default. I didn't for two reasons: As you suggest, I think it's a good idea to let this bake a bit before we make it the default all around. It's conceivable that one would not want this behavior, since in cases where many concurrent block writes happen on a single node, this policy will skew writes toward disks with more available free space, potentially impacting write throughput. This latest patch also fixes two bugs I discovered during more manual testing: There was a rather narrow race if the free space on a given volume changed during the course of making the volume selection. This is addressed by getting the available space for each volume once upfront and then using that for the duration of the process of choosing. There was an issue wherein if more volumes had high available free space than low available free space, the ones with low available free space would actually be allocated more writes per-volume than the others. This is fixed by properly scaling the configured preference percent to account for the size of the two buckets. I also added some more unit tests to verify this is now handled properly.
          Aaron T. Myers made changes -
          Attachment HDFS-1804.patch [ 12577135 ]
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12577135/HDFS-1804.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 3 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          -1 findbugs. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/4188//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/4188//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html
          Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/4188//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12577135/HDFS-1804.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 3 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. -1 findbugs . The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/4188//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/4188//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/4188//console This message is automatically generated.
          Hide
          Aaron T. Myers added a comment -

          Bummer. Here's a patch to address the findbugs warning, which was spurious but it's easy enough to work around.

          Show
          Aaron T. Myers added a comment - Bummer. Here's a patch to address the findbugs warning, which was spurious but it's easy enough to work around.
          Aaron T. Myers made changes -
          Attachment HDFS-1804.patch [ 12577149 ]
          Hide
          Chris Nauroth added a comment -

          Hi, Aaron. This looks really good. I applied the patch locally and ran the new tests successfully. Here are a couple of thoughts:

          1. The algorithm doesn't apply a scaling factor to account for a heterogeneous set of volumes. For example, consider a mix of 500 GB disks and 2 TB disks, all initially empty. It would favor the 2 TB disks heavily until the amount of free space across all volumes (both 500 GB and 2 TB) balanced out. This would mean that the risk of harming concurrency is higher if using heterogeneous volumes. Another scaling factor for relative volume capacity could account for this. I defer to you on whether or not to consider this in scope right now or skip it and treat it as a potential future feature request to be handled in a different jira. The scope that I see in the jira description is open-ended. I don't think you can even get total capacity through FsVolumeSpi right now, so I expect it would be a much bigger change.
          2. Minor thing: the new helper methods in GenericTestUtils would have greater symmetry with JUnit's built-in assert methods if the order of arguments was: expected, actual, delta.
          3. For initialization of balancedPreferencePercent:
            private float balancedPreferencePercent = 0.75f;
          

          Do you want to use DFSConfigKeys#DFS_DATANODE_FSDATASET_VOLUME_CHOOSING_BALANCED_SPACE_PREFERENCE_PERCENT_DEFAULT here? (Though I imagine it won't really matter in practice since it gets overwritten from config.)

          Thanks, Aaron!

          Show
          Chris Nauroth added a comment - Hi, Aaron. This looks really good. I applied the patch locally and ran the new tests successfully. Here are a couple of thoughts: The algorithm doesn't apply a scaling factor to account for a heterogeneous set of volumes. For example, consider a mix of 500 GB disks and 2 TB disks, all initially empty. It would favor the 2 TB disks heavily until the amount of free space across all volumes (both 500 GB and 2 TB) balanced out. This would mean that the risk of harming concurrency is higher if using heterogeneous volumes. Another scaling factor for relative volume capacity could account for this. I defer to you on whether or not to consider this in scope right now or skip it and treat it as a potential future feature request to be handled in a different jira. The scope that I see in the jira description is open-ended. I don't think you can even get total capacity through FsVolumeSpi right now, so I expect it would be a much bigger change. Minor thing: the new helper methods in GenericTestUtils would have greater symmetry with JUnit's built-in assert methods if the order of arguments was: expected, actual, delta. For initialization of balancedPreferencePercent : private float balancedPreferencePercent = 0.75f; Do you want to use DFSConfigKeys#DFS_DATANODE_FSDATASET_VOLUME_CHOOSING_BALANCED_SPACE_PREFERENCE_PERCENT_DEFAULT here? (Though I imagine it won't really matter in practice since it gets overwritten from config.) Thanks, Aaron!
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12577149/HDFS-1804.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 3 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          -1 findbugs. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/4189//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/4189//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html
          Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/4189//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12577149/HDFS-1804.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 3 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. -1 findbugs . The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/4189//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/4189//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/4189//console This message is automatically generated.
          Hide
          Aaron T. Myers added a comment -

          Thanks a lot for the review, Chris. Here's a patch which should address all of your comments except the first one. Let's leave that for another JIRA if we find that people are concerned with such a thing.

          This latest patch should also address the findbugs warning which was, again, spurious.

          Show
          Aaron T. Myers added a comment - Thanks a lot for the review, Chris. Here's a patch which should address all of your comments except the first one. Let's leave that for another JIRA if we find that people are concerned with such a thing. This latest patch should also address the findbugs warning which was, again, spurious.
          Aaron T. Myers made changes -
          Attachment HDFS-1804.patch [ 12577177 ]
          Hide
          Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12577177/HDFS-1804.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 3 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/4192//testReport/
          Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/4192//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - +1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12577177/HDFS-1804.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 3 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/4192//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/4192//console This message is automatically generated.
          Hide
          Chris Nauroth added a comment -

          +1 for the current patch. I reapplied the patch and reran the new tests successfully. Thanks for addressing the feedback. It looks great.

          Show
          Chris Nauroth added a comment - +1 for the current patch. I reapplied the patch and reran the new tests successfully. Thanks for addressing the feedback. It looks great.
          Hide
          Harsh J added a comment -

          I doubt seriously this will be a problem at all in practice, so let's hold off on doing so until it does. It wouldn't be tough to do, but I don't think the added code complexity is warranted.

          Ok, makes sense. Thanks for placing in the other changes.

          Also thanks to Chris for the review!

          +1.

          Show
          Harsh J added a comment - I doubt seriously this will be a problem at all in practice, so let's hold off on doing so until it does. It wouldn't be tough to do, but I don't think the added code complexity is warranted. Ok, makes sense. Thanks for placing in the other changes. Also thanks to Chris for the review! +1.
          Aaron T. Myers made changes -
          Summary Modify or add a new block-volume device choosing policy that looks at free space Add a new block-volume device choosing policy that looks at free space
          Hide
          Aaron T. Myers added a comment -

          Thanks a lot for the reviews, Chris and Harsh. I've just committed this to trunk and branch-2.

          Show
          Aaron T. Myers added a comment - Thanks a lot for the reviews, Chris and Harsh. I've just committed this to trunk and branch-2.
          Aaron T. Myers made changes -
          Status Patch Available [ 10002 ] Resolved [ 5 ]
          Hadoop Flags Reviewed [ 10343 ]
          Fix Version/s 2.0.5-beta [ 12324031 ]
          Resolution Fixed [ 1 ]
          Hide
          Hudson added a comment -

          Integrated in Hadoop-trunk-Commit #3572 (See https://builds.apache.org/job/Hadoop-trunk-Commit/3572/)
          HDFS-1804. Add a new block-volume device choosing policy that looks at free space. Contributed by Aaron T. Myers. (Revision 1465183)

          Result = SUCCESS
          atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1465183
          Files :

          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/test/GenericTestUtils.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/AvailableSpaceVolumeChoosingPolicy.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/TestAvailableSpaceVolumeChoosingPolicy.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/TestRoundRobinVolumeChoosingPolicy.java
          Show
          Hudson added a comment - Integrated in Hadoop-trunk-Commit #3572 (See https://builds.apache.org/job/Hadoop-trunk-Commit/3572/ ) HDFS-1804 . Add a new block-volume device choosing policy that looks at free space. Contributed by Aaron T. Myers. (Revision 1465183) Result = SUCCESS atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1465183 Files : /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/test/GenericTestUtils.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/AvailableSpaceVolumeChoosingPolicy.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/TestAvailableSpaceVolumeChoosingPolicy.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/TestRoundRobinVolumeChoosingPolicy.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Yarn-trunk #176 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/176/)
          HDFS-1804. Add a new block-volume device choosing policy that looks at free space. Contributed by Aaron T. Myers. (Revision 1465183)

          Result = SUCCESS
          atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1465183
          Files :

          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/test/GenericTestUtils.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/AvailableSpaceVolumeChoosingPolicy.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/TestAvailableSpaceVolumeChoosingPolicy.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/TestRoundRobinVolumeChoosingPolicy.java
          Show
          Hudson added a comment - Integrated in Hadoop-Yarn-trunk #176 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/176/ ) HDFS-1804 . Add a new block-volume device choosing policy that looks at free space. Contributed by Aaron T. Myers. (Revision 1465183) Result = SUCCESS atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1465183 Files : /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/test/GenericTestUtils.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/AvailableSpaceVolumeChoosingPolicy.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/TestAvailableSpaceVolumeChoosingPolicy.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/TestRoundRobinVolumeChoosingPolicy.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Hdfs-trunk #1365 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1365/)
          HDFS-1804. Add a new block-volume device choosing policy that looks at free space. Contributed by Aaron T. Myers. (Revision 1465183)

          Result = FAILURE
          atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1465183
          Files :

          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/test/GenericTestUtils.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/AvailableSpaceVolumeChoosingPolicy.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/TestAvailableSpaceVolumeChoosingPolicy.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/TestRoundRobinVolumeChoosingPolicy.java
          Show
          Hudson added a comment - Integrated in Hadoop-Hdfs-trunk #1365 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1365/ ) HDFS-1804 . Add a new block-volume device choosing policy that looks at free space. Contributed by Aaron T. Myers. (Revision 1465183) Result = FAILURE atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1465183 Files : /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/test/GenericTestUtils.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/AvailableSpaceVolumeChoosingPolicy.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/TestAvailableSpaceVolumeChoosingPolicy.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/TestRoundRobinVolumeChoosingPolicy.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Mapreduce-trunk #1392 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1392/)
          HDFS-1804. Add a new block-volume device choosing policy that looks at free space. Contributed by Aaron T. Myers. (Revision 1465183)

          Result = SUCCESS
          atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1465183
          Files :

          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/test/GenericTestUtils.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/AvailableSpaceVolumeChoosingPolicy.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/TestAvailableSpaceVolumeChoosingPolicy.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/TestRoundRobinVolumeChoosingPolicy.java
          Show
          Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk #1392 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1392/ ) HDFS-1804 . Add a new block-volume device choosing policy that looks at free space. Contributed by Aaron T. Myers. (Revision 1465183) Result = SUCCESS atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1465183 Files : /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/test/GenericTestUtils.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/AvailableSpaceVolumeChoosingPolicy.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/TestAvailableSpaceVolumeChoosingPolicy.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/TestRoundRobinVolumeChoosingPolicy.java
          Hide
          Aaron T. Myers added a comment -

          Adding a release note to advertise this new feature better per Harsh's request in HDFS-4830.

          Show
          Aaron T. Myers added a comment - Adding a release note to advertise this new feature better per Harsh's request in HDFS-4830 .
          Aaron T. Myers made changes -
          Release Note There is now a new option to have the DN take into account available disk space on each volume when choosing where to place a replica when performing an HDFS write. This can be enabled by setting the config "dfs.datanode.fsdataset.volume.choosing.policy" to the value "org.apache.hadoop.hdfs.server.datanode.fsdataset.AvailableSpaceVolumeChoosingPolicy".
          Arun C Murthy made changes -
          Status Resolved [ 5 ] Closed [ 6 ]

            People

            • Assignee:
              Aaron T. Myers
              Reporter:
              Harsh J
            • Votes:
              0 Vote for this issue
              Watchers:
              23 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development