Uploaded image for project: 'Flume'
  1. Flume
  2. FLUME-2701

Adding WebHDFS support

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Patch Available
    • Major
    • Resolution: Unresolved
    • None
    • None
    • None
    • None

    Description

      I'm using HttpFs as a HDFS Web Gateway to handle data from Flume in other datacenter via Internet or WAN, in my case, a gateway is necessary for minimizing the footprint required to access HDFS, but WebHDFS API do not support hsync(), which is required by Flume.

      HDFS will sync all data and metadata to DN disk before file close, and it also works in WebHDFS API. It seems to me that we can use this guarantee to make data safe without hsync() when unavailable. Personally, I guess it’s much easier than adding hsync() support to WebHDFS/HttpFs.

      Basically, the idea is making transaction open until rolling occurs, if we found the schema of HDFS URI is “webhdfs”.

      Attachments

        1. webhdfs.1.patch
          12 kB
          Mark Grover
        2. webhdfs.2.patch
          16 kB
          Mark Grover

        Activity

          People

            Unassigned Unassigned
            mark Mark Grover
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: