I can access the jira now. Maybe it's better to copy the discussion here:
On 12/08/10 14:48, "Wang, Yongkun | Yongkun | BDD" <firstname.lastname@example.org> wrote:
I am working on the patch now, it's not difficult. I have listed the changes in that JIRA.
I think you misunderstand my design, I didn't maintain the order of the events. Instead I make sure that each sink will get the same events (or different events specified by selector).
Suppose Channel (mc) contains the following events: 4,3,2,1
If simply enable it by configuration, it may work like this:
Sink "hsa" may get 1,3;
Sink "hsb" may get 2,4;
So different sink will get different data. Is this what user wants?
In my design, "hsa" and "hsb" will both get "4,3,2,1". This is a typical case when user want to fan-out the data into two places (eg. One for batch and and another for real-time analysis).
On 12/08/10 14:29, "Denny Ye" <email@example.com> wrote:
JIRA can be accessed now.
I think it might be difficult to understand the order of events from
your thought. If we don't care about the order, can discuss the value and
feasibility. In my opinion, data ingest flow is order unawareness, at
least, not such important for us. You can try to verify your proposal and
give us result. It may be some difficulties in keeping transaction with
2012/8/10 Wang, Yongkun | Yongkun | BDD <firstname.lastname@example.org>
JIRA is down again? I cannot connect to it and comment there.
I have a proposal in "Transactional Multiplex (fan out) Sink"):
Which contains the design of one channel to multiple sinks.
You can search the email since JIRA cannot be accessed.
I think this is more than a configuration issue. If simply enable several sinks on the same channel, they will take it either in a round-robin mode or in a unpredictable mode if the speed of sinks are different.
So it's better to have a even higher level transaction control instead of the transaction in the process() of each sink, as I describe in FLUME-1435.