Details
-
New Feature
-
Status: Resolved
-
Major
-
Resolution: Won't Fix
-
0.9.4
-
None
-
None
Description
Currently batching and compression options can be specified as data flow elements (decorators) but there are subtle issues that make them difficult to use effectively, especially in the e2e case.
The proposal here is to add compression and batching features to the rpc sinks. This will likely require the addition of a "flush" or "sync" call to the sink/decorator interface. However, this will greatly simplify the use of these optimizations from a user perspective.
Here are some examples:
This is ok:
batch(100) gzip rpcSink("xxx",1234)
In the new implementation it would be something like
rpcSink("xxx",1234, compression="gzip", batch="count(100)")
Ideally the rpcSource's will be able to just accept compressed or batched data.
Here's an example of thinks that seem inconsistent an take too long to explain (and thus is too complicated)
Today, this should work, essentially as expected:
agent : source | batch(100) gzip agentBESink("collector"); collector : collectorSource | gunzip unbatch collectorSink("XXX");
This works, but may not work the one would expect (in the batching buffer can get lost becuase the wal happens after batching/gziping).
agent : source | batch(100) gzip agentE2ESink("collector"); collector : collectorSource | gunzip unbatch collectorSink("XXX");
This one will not work. (compressed events have 0 size body, acks work on bodies, thus acks are worthless).
agent : source | batch(100) gzip agentE2ESink("collector"); collector : collectorSource | collector(30000) { gunzip unbatch escapedCustomDfs("XXX","yyy") };
Attachments
Issue Links
- blocks
-
FLUME-418 Add batching and compression arguments to agent and collectors
- Closed