Uploaded image for project: 'Flume'
  1. Flume
  2. FLUME-776

Create generic APIs for input / output formats and serialization

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Resolved
    • Blocker
    • Resolution: Fixed
    • NG alpha 1
    • 1.2.0
    • None
    • None

    Description

      Flume should have a generic set of APIs to handle input and output formats as well as event serialization.

      These APIs should offer the same level of abstraction as Hadoop's InputFormat, OutputFormat, RecordReader, RecordWriter, and serializer interfaces / classes. The only rationale for not using Hadoop's specific implementation of these APIs is because we want to avoid that dependency and everything that comes with it. Examples of API usage would be:

      • HDFS sink, text file output, events serialized as JSON
      • HDFS sink, text file output, events serialized as text, Snappy compressed
      • HDFS sink, Avro file output, events serialized as Avro records, GZIP compressed.
      • HBase sink, event fields[1] serialized as Thrift

      [1] The case of HBase is odd in that the event needs to be broken into individual fields (i.e. extracted to a complex type). This means some kind of custom mapping / extraction code or configuration needs to supplied by the user; we're not overly concerned with that for this issue.

      The implementations of the formats (text file, Avro), serializations (JSON, Avro, Thrift), and compression codecs (Snappy, GZIP) listed above are just examples. We'll open separate JIRAs for implementations. The scope of this JIRA is the framework / infrastructure.

      Attachments

        Activity

          People

            Unassigned Unassigned
            esammer Eric Sammer
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: