Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-5442

Zero loss HDFS data replication for multiple datacenters

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • None
    • None

    Description

      Hadoop is architected to operate efficiently at scale for normal hardware failures within a datacenter. Hadoop is not designed today to handle datacenter failures. Although HDFS is not designed for nor deployed in configurations spanning multiple datacenters, replicating data from one location to another is common practice for disaster recovery and global service availability. There are current solutions available for batch replication using data copy/export tools. However, while providing some backup capability for HDFS data, they do not provide the capability to recover all your HDFS data from a datacenter failure and be up and running again with a fully operational Hadoop cluster in another datacenter in a matter of minutes. For disaster recovery from a datacenter failure, we should provide a fully distributed, zero data loss, low latency, high throughput and secure HDFS data replication solution for multiple datacenter setup.

      Design and code for Phase-1 to follow soon.

      Attachments

        1. Disaster Recovery Solution for Hadoop.pdf
          1.22 MB
          Dian Fu
        2. Disaster Recovery Solution for Hadoop.pdf
          1.10 MB
          Haifeng Chen
        3. Disaster Recovery Solution for Hadoop.pdf
          1.11 MB
          Haifeng Chen

        Issue Links

          Activity

            People

              dian.fu Dian Fu
              avik_dey@yahoo.com Avik Dey
              Votes:
              9 Vote for this issue
              Watchers:
              107 Start watching this issue

              Dates

                Created:
                Updated: