Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-15693

Improve native code's performance when writing to HDFS

Log workAgile BoardRank to TopRank to BottomAttach filesAttach ScreenshotAdd voteVotersWatch issueWatchersCreate sub-taskConvert to sub-taskMoveLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • fuse-dfs, native
    • None

    Description

      For reads, we introduced direct buffers in order to more efficiently communicate between the JVM and the native code, and we have readDirect and pReadDirect in hdfs.c implemented.

      Writes on the other hand still use the putByteArrayRegion call, which results in a copy of the buffer in memory.

      This Jira is to explore what has to be done in order to start to use direct buffers.
      A short initial list I see at the moment:

      • add a new StreamCapability for streams wanting to support writes via direct buffer
        - implement this capability in the DFSOutputStream and DFSStripedOutputStream
      • implement a writeDirect method on the native side

      fuse_dfs can benefit from this.

      Attachments

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            Unassigned Unassigned Assign to me
            pifta Istv√°n Fajth

            Dates

              Created:
              Updated:

              Issue deployment