Hadoop HDFS
  1. Hadoop HDFS
  2. HDFS-1564

Make dfs.datanode.du.reserved configurable per volume

    Details

    • Type: New Feature New Feature
    • Status: Open
    • Priority: Minor Minor
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: datanode
    • Labels:
      None

      Description

      In clusters with DNs which have heterogeneous data dir volumes, it would be nice if dfs.datanode.du.reserved could be configured per-volume.

        Issue Links

          Activity

          Hide
          Allen Wittenauer added a comment -

          This topic actually came up a few years ago in a JIRA, but I can't find the #.

          Show
          Allen Wittenauer added a comment - This topic actually came up a few years ago in a JIRA, but I can't find the #.
          Hide
          Konstantin Shvachko added a comment -

          dfs.datanode.du.reserved is per volume. Here is the description from hdfs-default.xml:

          <property>
            <name>dfs.datanode.du.reserved</name>
            <value>0</value>
            <description>Reserved space in bytes per volume. Always leave this much space free for non dfs use.
            </description>
          </property>
          

          Also if you look in the code the property is used in FSVolume, which corresponds to one volume.
          Or do I miss what this is about?

          Show
          Konstantin Shvachko added a comment - dfs.datanode.du.reserved is per volume. Here is the description from hdfs-default.xml: <property> <name>dfs.datanode.du.reserved</name> <value>0</value> <description>Reserved space in bytes per volume. Always leave this much space free for non dfs use. </description> </property> Also if you look in the code the property is used in FSVolume, which corresponds to one volume. Or do I miss what this is about?
          Hide
          Aaron T. Myers added a comment -

          My understanding is that the requester would like the ability to configure this independently per volume. e.g. configure 10GB reserved space for volume /fs1 and 20GB reserved space for volume /fs2.

          Show
          Aaron T. Myers added a comment - My understanding is that the requester would like the ability to configure this independently per volume. e.g. configure 10GB reserved space for volume /fs1 and 20GB reserved space for volume /fs2.
          Hide
          Konstantin Shvachko added a comment -

          The main discussion was in HADOOP-1463 with subsequent improvements in HADOOP-2816, HADOOP-2549.
          We used to have dfs.datanode.du.pct, which defined a percent of reserved space per volume. This was originally intended for heterogeneous systems, but caused controversy or was not understood well, and was removed by HADOOP-4430.
          I don't see any other way to address the different volumes issue but to reintroduce dfs.datanode.du.pct. If this is what the requester wants let him specify the exact meaning of the parameter and its relation to the existing ones.

          Show
          Konstantin Shvachko added a comment - The main discussion was in HADOOP-1463 with subsequent improvements in HADOOP-2816 , HADOOP-2549 . We used to have dfs.datanode.du.pct , which defined a percent of reserved space per volume. This was originally intended for heterogeneous systems, but caused controversy or was not understood well, and was removed by HADOOP-4430 . I don't see any other way to address the different volumes issue but to reintroduce dfs.datanode.du.pct . If this is what the requester wants let him specify the exact meaning of the parameter and its relation to the existing ones.
          Hide
          mag added a comment -

          I was the reporter of this issue. Aaron relied my problem correctly. We have many heterogeneous disks ranging from 80GB to 2TB. We have filesystems layed out like this, /fs1, /fs2, /fs3, /fs4, etc... Each filesystem varies in size significantly therefore it would be nice to setup 'dfs.datanode.du.reserved' (or pct) by volume basis.

          Show
          mag added a comment - I was the reporter of this issue. Aaron relied my problem correctly. We have many heterogeneous disks ranging from 80GB to 2TB. We have filesystems layed out like this, /fs1, /fs2, /fs3, /fs4, etc... Each filesystem varies in size significantly therefore it would be nice to setup 'dfs.datanode.du.reserved' (or pct) by volume basis.
          Hide
          Harsh J added a comment -

          mag - Would using a pct% value not work for you w.r.t. multiple points?

          Show
          Harsh J added a comment - mag - Would using a pct% value not work for you w.r.t. multiple points?
          Hide
          Rita M added a comment -

          Interesting topic. I too have the same issue.

          Harsh, when setting the pct it seems hdfs still gobbles up more than the reserved data. I suspect its taking the entire data node and then reserving 1% not by volume wise.

          Show
          Rita M added a comment - Interesting topic. I too have the same issue. Harsh, when setting the pct it seems hdfs still gobbles up more than the reserved data. I suspect its taking the entire data node and then reserving 1% not by volume wise.
          Hide
          Harsh J added a comment -

          Rita,

          The percentage opt was removed. It isn't valid anymore. But it used to apply per volume, and it will be if reintroduced as well.

          The remaining action of discussion here is to decide if we should go with:

          • Percentage opt, one value across all disks.
          • Byte opt, one value per disk.

          Or leave it as is, where one byte opt applies across all disks.

          Show
          Harsh J added a comment - Rita, The percentage opt was removed. It isn't valid anymore. But it used to apply per volume, and it will be if reintroduced as well. The remaining action of discussion here is to decide if we should go with: Percentage opt, one value across all disks. Byte opt, one value per disk. Or leave it as is, where one byte opt applies across all disks.
          Hide
          Rita M added a comment -

          Harsh,

          Definitely as Byte opt, one value per disk.

          Here is how I would do the use case,

          Define the partition

          <property>
          <name>dfs.data.dir.0</name>
          <value>/fs1</value>
          <final>true</final>
          </property>

          <!-- reserve 1% on /fs1 -->
          <property>
          <name>dfs.data.dir.0.reservepct</name>
          <value>1</value>
          <final>true</final>
          </property>

          Any thoughts about this?

          Show
          Rita M added a comment - Harsh, Definitely as Byte opt, one value per disk. Here is how I would do the use case, Define the partition <property> <name>dfs.data.dir.0</name> <value>/fs1</value> <final>true</final> </property> <!-- reserve 1% on /fs1 --> <property> <name>dfs.data.dir.0.reservepct</name> <value>1</value> <final>true</final> </property> Any thoughts about this?
          Hide
          Harsh J added a comment -

          I wonder if this still makes sense to have, and if we should just close it and keep the existing behavior which seems to work well enough for many users (Not many seem to seek nor complain over this).

          Rita's way would incur too many configuration additions, considering that we have DNs today with 8-12 disks or even more. Having a comma-list of reservations is also expensive (and confusing) to maintain, and an admin can easily make mistakes when adding/removing volumes to the data dir set.

          I propose that we close this as Won't Fix for now, until there is sufficient user-driven need again. I've simply not seen enough need for this, and setting a reserved value across disks also makes sense.

          Show
          Harsh J added a comment - I wonder if this still makes sense to have, and if we should just close it and keep the existing behavior which seems to work well enough for many users (Not many seem to seek nor complain over this). Rita's way would incur too many configuration additions, considering that we have DNs today with 8-12 disks or even more. Having a comma-list of reservations is also expensive (and confusing) to maintain, and an admin can easily make mistakes when adding/removing volumes to the data dir set. I propose that we close this as Won't Fix for now, until there is sufficient user-driven need again. I've simply not seen enough need for this, and setting a reserved value across disks also makes sense.
          Hide
          Prashant Kommireddi added a comment -

          Wouldn't "dfs.datanode.du.pct" make sense at the node level, at least? Consider the case when one adds new hardware to the cluster and chooses machines with a different disk capacity compared to current hardware. Current way of using "dfs.datanode.du.reserved" will make it hard to say, reserve 5% for non-dfs use.

          Show
          Prashant Kommireddi added a comment - Wouldn't "dfs.datanode.du.pct" make sense at the node level, at least? Consider the case when one adds new hardware to the cluster and chooses machines with a different disk capacity compared to current hardware. Current way of using "dfs.datanode.du.reserved" will make it hard to say, reserve 5% for non-dfs use.
          Hide
          Assaf Yardeni added a comment -

          Hello,
          In my use-case having the "dfs.datanode.du.pct" reintroduce will be great. We have different disk sizes on our DNs.
          I don't think it makes sense to have multiple entries (per disk), that doesn't scale in terms of operations.

          Show
          Assaf Yardeni added a comment - Hello, In my use-case having the "dfs.datanode.du.pct" reintroduce will be great. We have different disk sizes on our DNs. I don't think it makes sense to have multiple entries (per disk), that doesn't scale in terms of operations.
          Hide
          Harsh J added a comment -

          Seems like there's still much demand for having a du.pct approach over a byte-based approach. Anyone willing to provide up a patch for this? I can help review and add.

          Show
          Harsh J added a comment - Seems like there's still much demand for having a du.pct approach over a byte-based approach. Anyone willing to provide up a patch for this? I can help review and add.
          Hide
          John Meza added a comment -

          I think dfs.datanode.du.pct works well for heterogeneous disks, especially when the disks have a wide range of capacities. When disk capacities are the same or very close dfs.datanode.du.pct or dfs.datanode.du.reserved would work.

          Neither solve my needs well. I have an 8 DN cluster used for performance testing. On occasion I need some or all of these machines for other tasks. It would be great if could reserve 300Gb on a couple of disks. Not all of the disks, just a couple.

          Maintaining a comma-list can lead to mistakes, especially for DNs with more than a couple of disks. To simplify this identify reserve for specific disks, all others default to a value.
          For example, a DN with 8 disks: /fs1,/fs2,../fs8.

          <name>dfs.datanode.du.reserved</name>
          <value>10737418240, /fs1/dfs/dn:322122547200, /fs2/dfs/dn:322122547200</value>

          This reserves /fs1=300Gb, /fs2=300Gb, all others default=10Gb

          Show
          John Meza added a comment - I think dfs.datanode.du.pct works well for heterogeneous disks, especially when the disks have a wide range of capacities. When disk capacities are the same or very close dfs.datanode.du.pct or dfs.datanode.du.reserved would work. Neither solve my needs well. I have an 8 DN cluster used for performance testing. On occasion I need some or all of these machines for other tasks. It would be great if could reserve 300Gb on a couple of disks. Not all of the disks, just a couple. Maintaining a comma-list can lead to mistakes, especially for DNs with more than a couple of disks. To simplify this identify reserve for specific disks, all others default to a value. For example, a DN with 8 disks: /fs1,/fs2,../fs8. <name>dfs.datanode.du.reserved</name> <value>10737418240, /fs1/dfs/dn:322122547200, /fs2/dfs/dn:322122547200</value> This reserves /fs1=300Gb, /fs2=300Gb, all others default=10Gb

            People

            • Assignee:
              Unassigned
              Reporter:
              Aaron T. Myers
            • Votes:
              0 Vote for this issue
              Watchers:
              14 Start watching this issue

              Dates

              • Created:
                Updated:

                Development