Details

    • Type: Brainstorming Brainstorming
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      Consider how we might enable tiered HFile storage. If HDFS has the capability, we could create certain files on solid state devices where they might be frequently accessed, especially for random reads; and others (and by default) on spinning media as before. We could support the move of frequently read HFiles from spinning media to solid state. We already have CF statistics for this, would only need to add requisite admin interface; could even consider an autotiering option.

      Dhruba Borthakur did some early work in this area and wrote up his findings: http://hadoopblog.blogspot.com/2012/05/hadoop-and-solid-state-drives.html . It is important to note the findings but I suggest most of the recommendations are out of scope of this JIRA. This JIRA seeks to find an initial use case that produces a reasonable benefit, and serves as a testbed for further improvements. If I may paraphrase Dhruba's findings (any misstatements and errors are mine): First, the DFSClient code paths introduce significant latency, so the HDFS client (and presumably the DataNode, as the next bottleneck) will need significant work to knock that down. Need to investigate optimized (perhaps read-only) DFS clients, server side read and caching strategies. Second, RegionServers are heavily threaded and this imposes a lot of monitor contention and context switching cost. Need to investigate reducing the number of threads in a RegionServer, nonblocking IO and RPC.

        Issue Links

          Activity

          Hide
          Andrew Purtell added a comment -

          Suresh Srinivas Agreed, HDFS-2832 is it.

          Show
          Andrew Purtell added a comment - Suresh Srinivas Agreed, HDFS-2832 is it.
          Hide
          Suresh Srinivas added a comment -

          HDFS-3672 could be viewed as the beginnings of an API for device aware block placement (the query side).

          I am not sure how HDFS-3672 helps in this regard. However I think HDFS-2832 does. That was the main intent of that jira.

          Show
          Suresh Srinivas added a comment - HDFS-3672 could be viewed as the beginnings of an API for device aware block placement (the query side). I am not sure how HDFS-3672 helps in this regard. However I think HDFS-2832 does. That was the main intent of that jira.
          Hide
          Andrew Purtell added a comment -

          HDFS-3672 could be viewed as the beginnings of an API for device aware block placement (the query side).

          If:

          • HDFS had an API for querying the nature of its underlying storage volumes
          • That API can distinguish solid state from spinning media
          • HDFS also adds an API for specifying block placement according to the storage volume type, perhaps as a new argument to FileSystem#create.

          then the above would allow us to rewrite, probably through compaction, HFiles from one media type to another.

          If additionally parts of the namespace can be tagged with storage device affinity, perhaps by setting an attribute on a directory, then file blocks could be migrated in the background via FileSystem#rename() as a natural part of replication: If a DataNode reports block X on media type Y but the path prefers type Z, then the NameNode should figure out how to move the block if possible.

          Show
          Andrew Purtell added a comment - HDFS-3672 could be viewed as the beginnings of an API for device aware block placement (the query side). If: HDFS had an API for querying the nature of its underlying storage volumes That API can distinguish solid state from spinning media HDFS also adds an API for specifying block placement according to the storage volume type, perhaps as a new argument to FileSystem#create . then the above would allow us to rewrite, probably through compaction, HFiles from one media type to another. If additionally parts of the namespace can be tagged with storage device affinity, perhaps by setting an attribute on a directory, then file blocks could be migrated in the background via FileSystem#rename() as a natural part of replication: If a DataNode reports block X on media type Y but the path prefers type Z, then the NameNode should figure out how to move the block if possible.

            People

            • Assignee:
              Andrew Purtell
              Reporter:
              Andrew Purtell
            • Votes:
              0 Vote for this issue
              Watchers:
              33 Start watching this issue

              Dates

              • Created:
                Updated:

                Development