Hadoop HDFS
  1. Hadoop HDFS
  2. HDFS-5682

Heterogeneous Storage phase 2 - APIs to expose Storage Types

    Details

    • Type: New Feature New Feature
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Implemented
    • Affects Version/s: None
    • Fix Version/s: 2.7.0
    • Component/s: datanode, namenode
    • Labels:
      None

      Description

      Phase 1 (HDFS-2832) added support to present the DataNode as a collection of discrete storages of different types.

      This Jira is to track phase 2 of the Heterogeneous Storage work which involves exposing Storage Types to applications and adding Quota Management support for administrators.

      This phase will also include tools support for administrators/users.

        Issue Links

          Activity

          Hide
          Arpit Agarwal added a comment -

          Resolving per my earlier comment. This work was largely covered under HDFS-6584 and HDFS-7584.

          Show
          Arpit Agarwal added a comment - Resolving per my earlier comment. This work was largely covered under HDFS-6584 and HDFS-7584 .
          Hide
          Arpit Agarwal added a comment -

          Most of the remaining work planned for this Jira has been completed under HDFS-6584 and HDFS-7584. There is also some overlap with HDFS-6581. I plan to resolve this Jira as obsolete soon.

          Before you start trying to define multiple types of storage, you need to first consider the data flow in and out and what sort of processes occur on the storage and then for each type of storage, how you would implement it.
          As an example... What happens to the data when there's an insert, compaction, major compaction, indexing (secondary) , etc ... and for each of these categories how the new storage unit will function.

          Hi Michael Segel , none of these are filesystem primitives. HDFS just makes raw storage types available for use. The kind of analysis you describe is relevant to components higher up in the stack.

          Show
          Arpit Agarwal added a comment - Most of the remaining work planned for this Jira has been completed under HDFS-6584 and HDFS-7584 . There is also some overlap with HDFS-6581 . I plan to resolve this Jira as obsolete soon. Before you start trying to define multiple types of storage, you need to first consider the data flow in and out and what sort of processes occur on the storage and then for each type of storage, how you would implement it. As an example... What happens to the data when there's an insert, compaction, major compaction, indexing (secondary) , etc ... and for each of these categories how the new storage unit will function. Hi Michael Segel , none of these are filesystem primitives. HDFS just makes raw storage types available for use. The kind of analysis you describe is relevant to components higher up in the stack.
          Hide
          Michael Segel added a comment -

          You really can't start looking at this issue until you understand how it will impact things like scanning.
          Before you start trying to define multiple types of storage, you need to first consider the data flow in and out and what sort of processes occur on the storage and then for each type of storage, how you would implement it.

          As an example... What happens to the data when there's an insert, compaction, major compaction, indexing (secondary) , etc ... and for each of these categories how the new storage unit will function.

          Looking at the spec, (pdf) this thought process seems to be missing.

          Show
          Michael Segel added a comment - You really can't start looking at this issue until you understand how it will impact things like scanning. Before you start trying to define multiple types of storage, you need to first consider the data flow in and out and what sort of processes occur on the storage and then for each type of storage, how you would implement it. As an example... What happens to the data when there's an insert, compaction, major compaction, indexing (secondary) , etc ... and for each of these categories how the new storage unit will function. Looking at the spec, (pdf) this thought process seems to be missing.
          Hide
          Vinayakumar B added a comment -

          Hi Neeta Garimella, Most part of this feature is being implemented as part of HDFS-6584 ( Archival Storage Support ).
          You can take a look at it.
          Thanks

          Show
          Vinayakumar B added a comment - Hi Neeta Garimella , Most part of this feature is being implemented as part of HDFS-6584 ( Archival Storage Support ). You can take a look at it. Thanks
          Hide
          Vinayakumar B added a comment -

          Updated the target version to trunk (3.0.0) and removed affected version as this is new feature.

          Show
          Vinayakumar B added a comment - Updated the target version to trunk (3.0.0) and removed affected version as this is new feature.
          Hide
          Neeta Garimella added a comment -

          Hi, I'm trying to determine when this feature will be released. Hadoop roadmap wiki http://wiki.apache.org/hadoop/Roadmap lists this feature under 2.5, though the affected release in the header (at the top) is listed as 3.0. Could you clarify which release will this be delivered and tentative time-frame 4Q14, or 2015 to help us align our product roadmap.

          Thanks much for your response.

          Show
          Neeta Garimella added a comment - Hi, I'm trying to determine when this feature will be released. Hadoop roadmap wiki http://wiki.apache.org/hadoop/Roadmap lists this feature under 2.5, though the affected release in the header (at the top) is listed as 3.0. Could you clarify which release will this be delivered and tentative time-frame 4Q14, or 2015 to help us align our product roadmap. Thanks much for your response.
          Hide
          Zesheng Wu added a comment -

          Thanks for the responses Arpit Agarwal

          The function name should communicate that this is disk space quota for a specific storage type, as opposed to the overall quotas which are set with setQuota. If the proposed name is hard to follow, how about get/setsetQuotaByStorageType?

          Yes, get/setQuotaByStorageType will be clearer.

          Let's defer this for now. The API and protocol can both be easily extended in a backwards compatible manner in the future without affecting existing applications.

          OK

          We have to differentiate between quota unavailability vs disk space availability. The former will result in a quota violation exception, the latter will result in the behavior you described. We discuss the reasons for this in the HDFS-2832 design doc.

          Got it, thanks. I will look into the HDFS-2832 doc for more details.

          Show
          Zesheng Wu added a comment - Thanks for the responses Arpit Agarwal The function name should communicate that this is disk space quota for a specific storage type, as opposed to the overall quotas which are set with setQuota. If the proposed name is hard to follow, how about get/setsetQuotaByStorageType? Yes, get/setQuotaByStorageType will be clearer. Let's defer this for now. The API and protocol can both be easily extended in a backwards compatible manner in the future without affecting existing applications. OK We have to differentiate between quota unavailability vs disk space availability. The former will result in a quota violation exception, the latter will result in the behavior you described. We discuss the reasons for this in the HDFS-2832 design doc. Got it, thanks. I will look into the HDFS-2832 doc for more details.
          Hide
          Arpit Agarwal added a comment -

          Thanks for the feedback Zesheng Wu. My responses are below.

          1. About the storage type, because I didn't participate the discussion in HDFS-2832, I am confused by the current storage type DISK and SSD. I think SSD is also one type of disk, DISK and SSD are not orthogonal. Can we change storage type to HDD and SDD, this will be more straightforward?

          Good point, I'll look into making the names clearer. In a subsequent revision of the API we would like to eliminate the hard-coded names from code altogether.

          2. About setStorageTypeSpaceQuota/getStorageTypeSpaceQuota, these two names are not very natural. From the literal meaning, it sounds like setting/getting space quota on some storage type other than some type of storage. I would suggest that setStorageSpaceQuota/getStorageSpaceQuota will be better. I am not a native English speaker, if I were wrong, just ignore this.

          The function name should communicate that this is disk space quota for a specific storage type, as opposed to the overall quotas which are set with setQuota. If the proposed name is hard to follow, how about get/setsetQuotaByStorageType?

          3. About the command line, hdfs dfsadmin -get(set)StorageTypeSpaceQuota, I think get(set) one storage type once is simple and straightforward, if we get(set) more than one once, because there's no atomicity guarantee, it's complicated to handle failure.

          Yes I think we can simplify the command line as you suggested.

          4. About the StoragePreference class, as you said in the design doc in HDFS-2832, in the future HDFS will support place replicas on different storages, such as 1 on SSD, and 2 on HDD. I would suggest that StoragePerference class can support specifying storage type of each replica now, in this way, we can easily support the above feature in the future.

          Let's defer this for now. The API and protocol can both be easily extended in a backwards compatible manner in the future without affecting existing applications.

          5. About the create file sematics, as you said in the doc "During file creation there must be sufficient quota to place at least one block times the replication factor on the target storage type, otherwise the request is falied immediately with QuotaExceededException", I think it will be more natural and friendly that first create the file on the default storage(HDD) if there's not enough space of desired storage type , and than let the namenode replicate the block to desired storage lazily when there's enough space available.

          We have to differentiate between quota unavailability vs disk space availability. The former will result in a quota violation exception, the latter will result in the behavior you described. We discuss the reasons for this in the HDFS-2832 design doc.

          Show
          Arpit Agarwal added a comment - Thanks for the feedback Zesheng Wu . My responses are below. 1. About the storage type, because I didn't participate the discussion in HDFS-2832 , I am confused by the current storage type DISK and SSD. I think SSD is also one type of disk, DISK and SSD are not orthogonal. Can we change storage type to HDD and SDD, this will be more straightforward? Good point, I'll look into making the names clearer. In a subsequent revision of the API we would like to eliminate the hard-coded names from code altogether. 2. About setStorageTypeSpaceQuota/getStorageTypeSpaceQuota, these two names are not very natural. From the literal meaning, it sounds like setting/getting space quota on some storage type other than some type of storage. I would suggest that setStorageSpaceQuota/getStorageSpaceQuota will be better. I am not a native English speaker, if I were wrong, just ignore this. The function name should communicate that this is disk space quota for a specific storage type, as opposed to the overall quotas which are set with setQuota . If the proposed name is hard to follow, how about get / setsetQuotaByStorageType ? 3. About the command line, hdfs dfsadmin -get(set)StorageTypeSpaceQuota, I think get(set) one storage type once is simple and straightforward, if we get(set) more than one once, because there's no atomicity guarantee, it's complicated to handle failure. Yes I think we can simplify the command line as you suggested. 4. About the StoragePreference class, as you said in the design doc in HDFS-2832 , in the future HDFS will support place replicas on different storages, such as 1 on SSD, and 2 on HDD. I would suggest that StoragePerference class can support specifying storage type of each replica now, in this way, we can easily support the above feature in the future. Let's defer this for now. The API and protocol can both be easily extended in a backwards compatible manner in the future without affecting existing applications. 5. About the create file sematics, as you said in the doc "During file creation there must be sufficient quota to place at least one block times the replication factor on the target storage type, otherwise the request is falied immediately with QuotaExceededException", I think it will be more natural and friendly that first create the file on the default storage(HDD) if there's not enough space of desired storage type , and than let the namenode replicate the block to desired storage lazily when there's enough space available. We have to differentiate between quota unavailability vs disk space availability. The former will result in a quota violation exception, the latter will result in the behavior you described. We discuss the reasons for this in the HDFS-2832 design doc.
          Hide
          Zesheng Wu added a comment -

          I read through the API doc, its really very clear Following are some minor suggestions:
          1. About the storage type, because I didn't participate the discussion in HDFS-2832, I am confused by the current storage type DISK and SSD. I think SSD is also one type of disk, DISK and SSD are not orthogonal. Can we change storage type to HDD and SDD, this will be more straightforward?
          2. About setStorageTypeSpaceQuota/getStorageTypeSpaceQuota, these two names are not very natural. From the literal meaning, it sounds like setting/getting space quota on some storage type other than some type of storage. I would suggest that setStorageSpaceQuota/getStorageSpaceQuota will be better. I am not a native English speaker, if I were wrong, just ignore this.
          3. About the command line, hdfs dfsadmin -get(set)StorageTypeSpaceQuota, I think get(set) one storage type once is simple and straightforward, if we get(set) more than one once, because there's no atomicity guarantee, it's complicated to handle failure.
          4. About the StoragePreference class, as you said in the design doc in HDFS-2832, in the future HDFS will support place replicas on different storages, such as 1 on SSD, and 2 on HDD. I would suggest that StoragePerference class can support specifying storage type of each replica now, in this way, we can easily support the above feature in the future.
          5. About the create file sematics, as you said in the doc "During file creation there must be sufficient quota to place at least one block times the replication factor on the target storage type, otherwise the request is falied immediately with QuotaExceededException", I think it will be more natural and friendly that first create the file on the default storage(HDD) if there's not enough space of desired storage type , and than let the namenode replicate the block to desired storage lazily when there's enough space available.

          Show
          Zesheng Wu added a comment - I read through the API doc, its really very clear Following are some minor suggestions: 1. About the storage type, because I didn't participate the discussion in HDFS-2832 , I am confused by the current storage type DISK and SSD. I think SSD is also one type of disk, DISK and SSD are not orthogonal. Can we change storage type to HDD and SDD, this will be more straightforward? 2. About setStorageTypeSpaceQuota/getStorageTypeSpaceQuota, these two names are not very natural. From the literal meaning, it sounds like setting/getting space quota on some storage type other than some type of storage. I would suggest that setStorageSpaceQuota/getStorageSpaceQuota will be better. I am not a native English speaker, if I were wrong, just ignore this. 3. About the command line, hdfs dfsadmin -get(set)StorageTypeSpaceQuota, I think get(set) one storage type once is simple and straightforward, if we get(set) more than one once, because there's no atomicity guarantee, it's complicated to handle failure. 4. About the StoragePreference class, as you said in the design doc in HDFS-2832 , in the future HDFS will support place replicas on different storages, such as 1 on SSD, and 2 on HDD. I would suggest that StoragePerference class can support specifying storage type of each replica now, in this way, we can easily support the above feature in the future. 5. About the create file sematics, as you said in the doc "During file creation there must be sufficient quota to place at least one block times the replication factor on the target storage type, otherwise the request is falied immediately with QuotaExceededException", I think it will be more natural and friendly that first create the file on the default storage(HDD) if there's not enough space of desired storage type , and than let the namenode replicate the block to desired storage lazily when there's enough space available.
          Hide
          Zesheng Wu added a comment -

          Thanks Arpit Agarwal, I will take a look at the API doc and give my feedback soon.

          Show
          Zesheng Wu added a comment - Thanks Arpit Agarwal , I will take a look at the API doc and give my feedback soon.
          Hide
          Arpit Agarwal added a comment -

          Attaching a doc with the proposed API. Thanks to Chris Nauroth and Tsz Wo Nicholas Sze for providing feedback on earlier drafts of the document.

          This is not a detailed design doc of course. I will post that in a few days.

          Show
          Arpit Agarwal added a comment - Attaching a doc with the proposed API. Thanks to Chris Nauroth and Tsz Wo Nicholas Sze for providing feedback on earlier drafts of the document. This is not a detailed design doc of course. I will post that in a few days.
          Hide
          Nick Dimiduk added a comment -

          Have you considered making use of the SSD via BucketCache? I did some experiments with it a while back, to great success. Let's take this conversation to user@hbase if you'd like to discuss further.

          Show
          Nick Dimiduk added a comment - Have you considered making use of the SSD via BucketCache? I did some experiments with it a while back, to great success. Let's take this conversation to user@hbase if you'd like to discuss further.
          Hide
          Zesheng Wu added a comment -

          Thanks Arpit Agarwal Looking forward to read the API doc soon

          Show
          Zesheng Wu added a comment - Thanks Arpit Agarwal Looking forward to read the API doc soon
          Hide
          Arpit Agarwal added a comment -

          Last week I looked into HDFS-2832 and found that we need HDFS-5682 to be finished before heterogeneous storage can be used, so I asked for the progress of this issue. As you said this was deprioritized, I am wondering whether you can give a more specific plan, and I can spend some time working on some of the subtasks.

          Now that 2.4 is finished we'll have an API doc out soon.

          Show
          Arpit Agarwal added a comment - Last week I looked into HDFS-2832 and found that we need HDFS-5682 to be finished before heterogeneous storage can be used, so I asked for the progress of this issue. As you said this was deprioritized, I am wondering whether you can give a more specific plan, and I can spend some time working on some of the subtasks. Now that 2.4 is finished we'll have an API doc out soon.
          Hide
          Zesheng Wu added a comment -

          Thanks Nick Dimiduk, here are some details of our SLA requirements:
          The background is that we have a structured data storage system which is very similar with amazon DynamoDB and is built on HBase. This system is intended to serve multi-tenants, in order to ensure the SLA, such as 99% read/write latency, we need to place the data on SSD. One straightforward way is to set up a hdfs cluster with all SSD data disks, but this is cost too much. So we need to reduce the cost. Another way is to set up a hdfs cluster with both SSD and HDD data disks, we can put 1 replica on SSD, and the other 2 on HDD. This is really a typical scenario of hdfs heterogeneous storage.

          Show
          Zesheng Wu added a comment - Thanks Nick Dimiduk , here are some details of our SLA requirements: The background is that we have a structured data storage system which is very similar with amazon DynamoDB and is built on HBase. This system is intended to serve multi-tenants, in order to ensure the SLA, such as 99% read/write latency, we need to place the data on SSD. One straightforward way is to set up a hdfs cluster with all SSD data disks, but this is cost too much. So we need to reduce the cost. Another way is to set up a hdfs cluster with both SSD and HDD data disks, we can put 1 replica on SSD, and the other 2 on HDD. This is really a typical scenario of hdfs heterogeneous storage.
          Hide
          Nick Dimiduk added a comment -

          Hi Zesheng Wu. Can you describe your SLA requirements in detail? Maybe these needs can be addressed in other ways?

          Show
          Nick Dimiduk added a comment - Hi Zesheng Wu . Can you describe your SLA requirements in detail? Maybe these needs can be addressed in other ways?
          Hide
          Zesheng Wu added a comment -

          Thanks Arpit Agarwal, recently some of the our users have strong requirement of read/write SLA on HBase, and we notice that HDFS's heterogeneous storage feature is really interesting and helpful, thanks for your valuable work on this
          Last week I looked into HDFS-2832 and found that we need HDFS-5682 to be finished before heterogeneous storage can be used, so I asked for the progress of this issue. As you said this was deprioritized, I am wondering whether you can give a more specific plan, and I can spend some time working on some of the subtasks.

          Show
          Zesheng Wu added a comment - Thanks Arpit Agarwal , recently some of the our users have strong requirement of read/write SLA on HBase, and we notice that HDFS's heterogeneous storage feature is really interesting and helpful, thanks for your valuable work on this Last week I looked into HDFS-2832 and found that we need HDFS-5682 to be finished before heterogeneous storage can be used, so I asked for the progress of this issue. As you said this was deprioritized, I am wondering whether you can give a more specific plan, and I can spend some time working on some of the subtasks.
          Hide
          Arpit Agarwal added a comment -

          Hi Zesheng Wu, this was deprioritized due to 2.4 stabilization.

          We will post a document describing the proposed API within the next couple of weeks, followed shortly by the detailed design doc.

          Show
          Arpit Agarwal added a comment - Hi Zesheng Wu , this was deprioritized due to 2.4 stabilization. We will post a document describing the proposed API within the next couple of weeks, followed shortly by the detailed design doc.
          Hide
          Zesheng Wu added a comment -

          Hi Arpit Agarwal how is this issue going? When will you plan to release an usable version of HSM ?

          Show
          Zesheng Wu added a comment - Hi Arpit Agarwal how is this issue going? When will you plan to release an usable version of HSM ?
          Hide
          Arpit Agarwal added a comment -

          An updated design doc will be posted soon - please see HDFS-2832 for the current design.

          Show
          Arpit Agarwal added a comment - An updated design doc will be posted soon - please see HDFS-2832 for the current design.

            People

            • Assignee:
              Arpit Agarwal
              Reporter:
              Arpit Agarwal
            • Votes:
              1 Vote for this issue
              Watchers:
              61 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development