[METRON-322] Global Batching and Flushing - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Done
Priority: Major
Resolution: Done
Affects Version/s: None
Fix Version/s: 0.4.1
Labels:
None

Description

All Writers and other bolts that maintain an internal "batch" queue, need to have a timeout flush, to prevent messages from low-volume telemetries from sitting in their queues indefinitely. Storm has a timeout value (topology.message.timeout.secs) that prevents it from waiting for too long. If the Writer does not process the queue before the timeout, then Storm recycles the tuples through the topology. This has multiple undesirable consequences, including data duplication and waste of compute resources. We would like to be able to specify an interval after which the queues would flush, even if the batch size is not met.

We will utilize the Storm Tick Tuple to trigger timeout flushing, following the recommendations of the article at
http://hortonworks.com/blog/apache-storm-design-pattern-micro-batching/#CONCLUSION
Since every Writer processes its queue somewhat differently, every bolt that has a "batchSize" parameter will be given a "batchTimeout" parameter too. It will default to 1/2 the value of "topology.message.timeout.secs", as recommended, and will ignore settings larger than the default, which could cause failure to flush in time. In the Enrichment topology, where two Writers may be placed one after the other (enrichment and threat intel), the default timeout interval will be 1/4 the value of "topology.message.timeout.secs". The default value of "topology.message.timeout.secs" in Storm is 30 seconds.

In addition, Storm provides a limit on the number of pending messages that have not been acked. If more than "topology.max.spout.pending" messages are waiting in a topology, then Storm will recycle them through the topology. However, the default value of "topology.max.spout.pending" is null, and if set to non-null value, the user can manage the consequences by setting batchSize limits appropriately. Having the timeout flush will also ameliorate this issue. So we do not need to address "topology.max.spout.pending" directly in this task.

Edited 11 Aug 2017: Time-based flushing for ParserWriter and related classes moved to METRON-1105, allowing this (lengthy) jira to be closed with https://github.com/apache/metron/pull/481

Attachments

Issue Links

is related to

METRON-1105 Timeout-based batch flushing for ParserWriter and related classes

To Do

relates to

METRON-227 Add Time-Based Flushing to Writer Bolt

Done

links to

GitHub Pull Request #442

GitHub Pull Request #481

Sub-Tasks

1.	[CANCELLED] Add zookeeper flag to turn global flushing on and off	Done	Matthew Foley
2.	Add batchTimeout parameters with every use of batchSize parameters, defaulting to 0	Done	Matthew Foley
3.	Add new functionality to BulkMessageWriter and related classes for batchTimeout flushing	Done	Matthew Foley
4.	Add test cases for BulkMessageWriter with batchTimeout	Done	Matthew Foley
5.	Document configuration for batch timeout and related parameters	Done	Matthew Foley

Activity

People

Assignee:: Matthew Foley

Reporter:: Ajay Kumar

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 13/Jul/16 19:11

Updated:: 08/Sep/17 21:02

Resolved:: 11/Aug/17 20:54