Hadoop HDFS
  1. Hadoop HDFS
  2. HDFS-1539

prevent data loss when a cluster suffers a power loss

    Details

    • Hadoop Flags:
      Reviewed

      Description

      we have seen an instance where a external outage caused many datanodes to reboot at around the same time. This resulted in many corrupted blocks. These were recently written blocks; the current implementation of HDFS Datanodes do not sync the data of a block file when the block is closed.

      1. Have a cluster-wide config setting that causes the datanode to sync a block file when a block is finalized.
      2. Introduce a new parameter to the FileSystem.create() to trigger the new behaviour, i.e. cause the datanode to sync a block-file when it is finalized.
      3. Implement the FSDataOutputStream.hsync() to cause all data written to the specified file to be written to stable storage.

      1. syncOnClose1.txt
        6 kB
        dhruba borthakur
      2. syncOnClose2_b-1.txt
        6 kB
        Tsz Wo Nicholas Sze
      3. syncOnClose2.txt
        6 kB
        dhruba borthakur

        Activity

        No work has yet been logged on this issue.

          People

          • Assignee:
            dhruba borthakur
            Reporter:
            dhruba borthakur
          • Votes:
            0 Vote for this issue
            Watchers:
            22 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development