Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-457

better handling of volume failure in Data Node storage

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 0.20.203.0, 0.21.0
    • datanode
    • None
    • Reviewed
    • Datanode can continue if a volume for replica storage fails. Previously a datanode resigned if any volume failed.

    Description

      Current implementation shuts DataNode down completely when one of the configured volumes of the storage fails.
      This is rather wasteful behavior because it decreases utilization (good storage becomes unavailable) and imposes extra load on the system (replication of the blocks from the good volumes). These problems will become even more prominent when we move to mixed (heterogeneous) clusters with many more volumes per Data Node.

      Attachments

        1. HDFS_457.patch
          2 kB
          Jeff Zhang
        2. HDFS-457_20-append.patch
          27 kB
          Nicolas Spiegelberg
        3. HDFS-457.patch
          29 kB
          Boris Shkolnik
        4. HDFS-457-1.patch
          29 kB
          Boris Shkolnik
        5. HDFS-457-2.patch
          29 kB
          Boris Shkolnik
        6. HDFS-457-2.patch
          29 kB
          Boris Shkolnik
        7. HDFS-457-2.patch
          28 kB
          Boris Shkolnik
        8. HDFS-457-3.patch
          29 kB
          Boris Shkolnik
        9. HDFS-457-y20.patch
          15 kB
          Konstantin Shvachko
        10. jira.HDFS-457.branch-0.20-internal.patch
          16 kB
          Erik Steffl
        11. TestFsck.zip
          689 kB
          Tsz-wo Sze

        Issue Links

          Activity

            People

              boryas Boris Shkolnik
              boryas Boris Shkolnik
              Votes:
              0 Vote for this issue
              Watchers:
              15 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: