Uploaded image for project: 'Metron (Retired)'
  1. Metron (Retired)
  2. METRON-322

Global Batching and Flushing

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Done
    • Major
    • Resolution: Done
    • None
    • 0.4.1
    • None

    Description

      All Writers and other bolts that maintain an internal "batch" queue, need to have a timeout flush, to prevent messages from low-volume telemetries from sitting in their queues indefinitely. Storm has a timeout value (topology.message.timeout.secs) that prevents it from waiting for too long. If the Writer does not process the queue before the timeout, then Storm recycles the tuples through the topology. This has multiple undesirable consequences, including data duplication and waste of compute resources. We would like to be able to specify an interval after which the queues would flush, even if the batch size is not met.

      We will utilize the Storm Tick Tuple to trigger timeout flushing, following the recommendations of the article at
      http://hortonworks.com/blog/apache-storm-design-pattern-micro-batching/#CONCLUSION
      Since every Writer processes its queue somewhat differently, every bolt that has a "batchSize" parameter will be given a "batchTimeout" parameter too. It will default to 1/2 the value of "topology.message.timeout.secs", as recommended, and will ignore settings larger than the default, which could cause failure to flush in time. In the Enrichment topology, where two Writers may be placed one after the other (enrichment and threat intel), the default timeout interval will be 1/4 the value of "topology.message.timeout.secs". The default value of "topology.message.timeout.secs" in Storm is 30 seconds.

      In addition, Storm provides a limit on the number of pending messages that have not been acked. If more than "topology.max.spout.pending" messages are waiting in a topology, then Storm will recycle them through the topology. However, the default value of "topology.max.spout.pending" is null, and if set to non-null value, the user can manage the consequences by setting batchSize limits appropriately. Having the timeout flush will also ameliorate this issue. So we do not need to address "topology.max.spout.pending" directly in this task.

      Edited 11 Aug 2017: Time-based flushing for ParserWriter and related classes moved to METRON-1105, allowing this (lengthy) jira to be closed with https://github.com/apache/metron/pull/481

      Attachments

        Issue Links

          Activity

            People

              mattf Matthew Foley
              ajayydv Ajay Kumar
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: