Uploaded image for project: 'Flume'
  1. Flume
  2. FLUME-247

Add Efficient HBase Sink whith Flexible Event's Attributes Writing

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 0.9.4
    • Sinks+Sources
    • None

    Description

      Implement HBase output sink which puts event attributes into HBase record in a configurable manner (based on attr names).
      It can be used efficiently in pair with decorator, which "prepares" event attributes to be written into HBase.
      The sink implementation & behaviour is similar to FLUME-6. In addition to handling event attributes differently this sink also allows to configure performance-relevant HBase-specific options: client side buffer for HTable and disabling write to WAL, which is known to be used to increase HBase write performance a lot.

      This approach (with other ones) was discussed here: https://groups.google.com/a/cloudera.org/group/flume-dev/browse_thread/thread/629286f0a202c2ef. Small excerpt which shows how this sink is differ from the one that is going to be implemented in parallel:
      "The thing I want to stress out is that it would be nice if user has ability to dynamically populate new columns in the rows. By that I mean that user can a) define column-names at run-time and b) add whatever number of columns the logic decides to (also at run-time, based on the data being processed). Example of when one would need this flexibility can be found in google's bigtable paper."

      From the javadoc of the sink implementation:

      Sink has the next parameters: attr2hbase("table" [,"family"[,"attrPrefix"[,"writeBufferSize"[,"writeToWal"]]]]).
      "table" - HBase table name to perform output into.
      "family" - Column family's name which is used to store "system" data (event's timestamp, host).
      In case this param is absent or ="" the sink doesn't write "system" data.
      "attrPrefix" - Attributes with this prefix in key will be placed into HBase table. Default value: "2hb_".
      Attribute key should be in the following format: "<attrPrefix><columnFamily>:<qualifier>",
      e.g. "2hb_user:name" means that its value will be placed into "user" column family with "name" qualifier
      Attribute with key "<attrPrefix>" should contain row key for Put, otherwise (if attribute is absent) event's getNanos() used as row key value
      "writeBufferSize" - If provided, autoFlush for the HTable set to "false", and writeBufferSize is set to its value.
      This setting is valuable to boost HBase write speed.
      "writeToWal" - Determines whether WAL should be used during writing to HBase.
      This setting is valuable to boost HBase write speed, but decreases reliability level. Use it if you know what it does.

      The implemented Sink also implements method getSinkBuilders(), so it can be used as Flume's extension plugin (see flume.plugin.classes property of flume-site.xml config details).

      Attachments

        Issue Links

          Activity

            People

              alexb Alex Baranau
              alexb Alex Baranau
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: