Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-2535

A Model for Data Durability

VotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Task
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • None
    • documentation
    • None

    Description

      This is a statistical model considering data durability in HDFS in the presence of Data Node failures. The attached spreadsheet considers the probability of losing a block with three replicas in the case of uncorrelated failures of DNs. Also included is a section that looks at the consequences of simultaneous failures.

      The model parameters reflect experience at Yahoo with a large cluster. But it is easy to change the parameters in the spreadsheet. Number of replicas is not a easily adjusted parameter. And while published reports (the Google papers) suggest that node failures are not really uncorrelated, this does give some practical insight into HDFS durability.

      I and others have quoted from this work in the past. I thought it good to make the details conveniently available.

      Attachments

        1. LosingBlocks.xlsx
          199 kB
          Robert Chansler

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            chansler Robert Chansler
            chansler Robert Chansler
            Votes:
            0 Vote for this issue
            Watchers:
            14 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment