Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-6450

Enhance FSDataOutputStream to allow retrieving the current number of replicas of current block

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Won't Fix
    • None
    • None
    • fs
    • None

    Description

      The current HDFS implementation has the limitation that it does not replicate the last partial block of a file when it is being written into until the file is closed. There are some long running applications (e.g. HBase) which writes transactions logs into HDFS. If datanode(s) in the write pipeline dies off, the application has no knowledge of it until all the datanode(s) fail and the application gets an IO error.

      These applictions would benefit a lot if they can determine the number of live replicas of the current block to which it is writing data. For example, the application can decide that when one of the datanode in the write pipeline fails it will close the file and start writing to a new file.

      Attachments

        1. Replicable.txt
          2 kB
          Dhruba Borthakur
        2. Replicable.txt
          2 kB
          Dhruba Borthakur

        Issue Links

          Activity

            People

              dhruba Dhruba Borthakur
              dhruba Dhruba Borthakur
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: