Uploaded image for project: 'Flume'
  1. Flume
  2. FLUME-506

Sink that writes output to a sequenceFile in hdfs

    XMLWordPrintableJSON

    Details

    • Type: New Feature
    • Status: Resolved
    • Priority: Trivial
    • Resolution: Duplicate
    • Affects Version/s: 0.9.1u1
    • Fix Version/s: None
    • Component/s: Sinks+Sources
    • Labels:

      Description

      I have written a code to be able to write to hdfs in this style:
      roll(1000) [ escapedCustomDfs("hdfs://namenode/flume/file-%

      {rolltag}") ]
      But writing each event body as value in a SequenceFile (and NullWritable as key).
      As a possible scenario, there would be a client sending flume events to a rpcSource. The body of each event is a thrift object. I have a sink that opens/closes a SequenceFile every X minutes, writes the body of each event (a thrift object converted to bytes) as BytesWritable as the Value of the SequenceFile and writes the key as NullWritable.
      The purpose of this is that these SequenceFiles can be used as input in a MapReduce Job. Each BytesWritable value can be easily deserialized into a Thrift object (the initially generated by the client and set to the event body) in the map phase.
      The plugins are (placed in the plugins folder in the zip):
      com.cloudera.flume.handlers.hdfs.EscapedThriftSeqfileDfsSink
      com.cloudera.flume.handlers.hdfs.ThriftSeqfileDfsSink

      An example of node configuration:
      node.name : tSource(38575) | roll(60000) { escapedThriftSeqfileDfsSink("hdfs://host.local/user/flume/file-%{rolltag}

      ")}

      The zip contains also a couple of test:

      1.-Php client that generates thrift objects and sends them via rpc to the source (The example objects are instances of TObject class, generated with thrift). This can be found in /test/php_rpc_writer.

      2.-Java app that reads the sequenceFile generated by flume. The file has to be manually downloaded from hdfs and we must specify the path where we place it in the app. This app has the same TObject class than the php so it can deserialize the BytesWritable from the SequenceFile to the proper thrift object.
      *The examples have hardcoded paths which should be properly set.

        Attachments

        1. ASF.LICENSE.NOT.GRANTED--sink_plugins.zip
          32 kB
          Disabled imported user
        2. ASF.LICENSE.NOT.GRANTED--collector_sink_plugins.zip
          47 kB
          Disabled imported user
        3. ASF.LICENSE.NOT.GRANTED--collector_sink_plugins.zip
          49 kB
          Disabled imported user
        4. ASF.LICENSE.NOT.GRANTED--FLUME-506.patch
          23 kB
          Disabled imported user

          Issue Links

            Activity

              People

              • Assignee:
                jmhsieh Jonathan Hsieh
                Reporter:
                flume_sturlese Disabled imported user
              • Votes:
                0 Vote for this issue
                Watchers:
                0 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: