[FLUME-1435] Proposal of Transactional Multiplex (fan out) Sink - ASF JIRA

XML

Word

Printable

JSON

Details

Type: New Feature
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: 1.2.0
Fix Version/s: None
Component/s: Channel, Sinks+Sources
Labels:
- features

Description

Hi,

I have proposed this design by email several weeks ago. I received comment from Brock. I guess your guys are very busy, so I think I'd better create this JIRA, and put slides and patch here to explain it more clearly.

Regards,
Yongkun

Following is the design from previous email, I will attach slides later.

From: "Wang, Yongkun" <yongkun.wang@mail.rakuten.com>
Date: Wed, 25 Jul 2012 10:32:31 GMT
To: "dev@flume.apache.org" <dev@flume.apache.org>
Cc: "user@flume.apache.org" <user@flume.apache.org>
Subject: Transactional Multiplex (fan out) Sink

Hi,

In our system, we need to fan out the aggregated flow to several destinations. Usually the flow to each destination is identical.

There is a nice feature of NG, the "multiplexing flow", which can satisfy our requirements. It is implemented by using separated channels, which is easy to do transaction control.

But in our case, the fan out is replicating in most cases. If using the current "Replicating to Channels" configuration, we will get several identical channels on the same host, which may consume a large amount resources (memory, disk, etc.). The performance may possibly drop. And the events to each destination may not be synchronized.

I read NG source, I think I could move the multiplex from Channel to Sink, that is, using single Channel, fan out to different Sinks, which may solve the problems (resource usage, performance, event synchronization) of multiple Channels.

I see that there are "LoadBalancingSinkProcessor" and "SinkSelector" classes, but they cannot be used to achieve the target of replicating events from one Channel to different Sinks.

The following is an optional implementation of the Transactional Multiplex (or fan out) Sink:
1. Add a Transactional Multiplex Sink Processor, which will group the operations of all fan out Sinks into one transaction, and use a certain policy to commit the transaction.
2. Add MultiplexSink, which simply processes the Events and report status, no transaction.
3. Add "peek()" and "remove()" to Channel and Transaction.

The policy of committing a transaction can be defined as follow (suppose we have N Sinks) :
1. When M(0=<M<=N) Sinks succeed;
e.g. value: ANY, ONE, QUARUM, ALL
2. When specified M(0<M<=N) Sinks (important sinks) succeed.

A selector can also be used by Transactional Multiplex Sink Processor to filter the events for some Sinks (Optional).

And this can be combined with the existing Multiplex Channel Flow: Multiplex events into different Channels, each Channel can replicate to different Sinks.

Would like to hear your suggestions firstly.
If it is reasonable, I will create a ticket in JIRA and provide the patch for review.

Cheers,
Yongkun Wang (Kun)

Attachments

Issue Links

is related to

FLUME-2833 Implement a "replicating" sink processor

Open

relates to

FLUME-1479 Multiple Sinks can connect to single Channel

Resolved

Activity

People

Assignee:: Yongkun Wang

Reporter:: Yongkun Wang

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Dates

Created:: 08/Aug/12 06:03

Updated:: 30/Oct/15 09:37