Description
A special serializer for HDFS Sink was added to Flume a while back, but it's not documented. This serializer is useful when the source is any type of syslog source.
Without specifying the serializer, the timestamp and host are not logged to the file with the event information, which is pretty useless without the timestamp and hosts.
The serializer can be configured on an hdfs sink like so:
agent1.sinks.k1.serializer=HEADER_AND_TEXT
Without this serializer specified you get (for example):
adclient[12112]: INFO <bg:krb5.conf> daemon.main Start trusted domain discovery
as an event.
When you specify the serializer, the same event looks like this:
{timestamp=1364380838000, Severity=6, host=myhostname, Facility=4}adclient[12112]: INFO <bg:krb5.conf> daemon.main Start trusted domain discovery
Which is much more useful.