Details

    • Type: Task
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: documentation
    • Labels:
      None

      Description

      This is a statistical model considering data durability in HDFS in the presence of Data Node failures. The attached spreadsheet considers the probability of losing a block with three replicas in the case of uncorrelated failures of DNs. Also included is a section that looks at the consequences of simultaneous failures.

      The model parameters reflect experience at Yahoo with a large cluster. But it is easy to change the parameters in the spreadsheet. Number of replicas is not a easily adjusted parameter. And while published reports (the Google papers) suggest that node failures are not really uncorrelated, this does give some practical insight into HDFS durability.

      I and others have quoted from this work in the past. I thought it good to make the details conveniently available.

      1. LosingBlocks.xlsx
        199 kB
        Robert Chansler

        Activity

        Hide
        chansler Robert Chansler added a comment -

        Model Spreadsheet

        Show
        chansler Robert Chansler added a comment - Model Spreadsheet
        Hide
        szetszwo Tsz Wo Nicholas Sze added a comment -

        +1 great work!

        Show
        szetszwo Tsz Wo Nicholas Sze added a comment - +1 great work!
        Hide
        milindb Milind Bhandarkar added a comment -

        Rob, I think a convenient place for this would be on the hadoop wiki.

        Show
        milindb Milind Bhandarkar added a comment - Rob, I think a convenient place for this would be on the hadoop wiki.
        Hide
        sureshms Suresh Srinivas added a comment -

        Thanks Rob for making the spreadsheet available

        Show
        sureshms Suresh Srinivas added a comment - Thanks Rob for making the spreadsheet available
        Hide
        sri716 sri added a comment -

        Rob, Can you also point me to you presentation slides in Hadoop summit 2011 or any materials related to it.

        Show
        sri716 sri added a comment - Rob, Can you also point me to you presentation slides in Hadoop summit 2011 or any materials related to it.
        Show
        chansler Robert Chansler added a comment - At Hadoop Summit 2011: http://www.youtube.com/watch?v=zbycDpVWhp0 Also referenced in ;login: February 2012: https://www.usenix.org/publications/login/february-2012/data-availability-and-durability-hadoop-distributed-file-system
        Hide
        beeflyme caixiaofeng added a comment -

        mark it here.

        Show
        beeflyme caixiaofeng added a comment - mark it here.

          People

          • Assignee:
            chansler Robert Chansler
            Reporter:
            chansler Robert Chansler
          • Votes:
            0 Vote for this issue
            Watchers:
            12 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development