Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-14229

Nonblocking HDFS create|write

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • hdfs-client
    • None

    Description

      Right now, the create call on HDFS is blocking.  The write call can also be blocking if the write buffer reached its limit.

      However, for most applications, the only requirement is that when "close" on a file is called, the file is persisted and visible in HDFS.  There is no need to make "create" visible right after the "create" call returns.

      A particular use case of this is to use HDFS as a place to store shuffle data (in Spark, Map-Reduce, or other loose-coupled applications).

       

      This Jira proposes that we add a new "async-hdfs://" protocol that maps to a new AsyncDistributedFileSystem class, whose create call is nonblocking but still returns a FSOutputStream that is non-blocking on write (even when the file has not been physically created on HDFS yet; may only be blocking when a write buffer limit is specified and reached).  The close call on the FSOutputStream will block until the creation and all previous writes are completed and the file is closed.

       

      Note that this Jira is related to https://issues.apache.org/jira/browse/HDFS-9924 but not the same.  HDFS-9924 talks about async rename etc.  This Jira talks about async create|write. 

      Attachments

        Activity

          People

            Unassigned Unassigned
            zshao Zheng Shao
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

              Created:
              Updated: