Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-11435

NameNode should track open for write files lengths more frequent than on newer block allocations

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • None
    • None

    Description

      Problem:
      Currently the length of an open for write / Under construction file is updated on the NameNode only when

      1. Block boundary: On block boundaries and upon allocation of new Block, NameNode gets to know the file growth and the file length catches up
      2. hsync(SyncFlag.UPDATE_LENGTH): Upon Client apps invoking a hsync on the write stream with a special flag, DataNodes send an incremental block report with the latest file length which NameNode uses it to update its meta data.
      3. First hflush() on the new Block: Upon Client apps doing first time hflush() on an every new Block, DataNodes notifies NameNode about the latest file length.
      4. Output stream close: Forces DataNodes update NameNode about the file length after data persistence and proper acknowledgements in the pipeline.

      So, lengths for open for write files are usually a lot less than the length seen by the DN/client. Highly preferred to have NameNode not lagging in file lengths by order of Block size for under construction files and to have more frequent, scalable update mechanism for these open file lengths.

      Attachments

        Activity

          People

            manojg Manoj Govindassamy
            manojg Manoj Govindassamy
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated: