Details
-
Sub-task
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
-
None
Description
In doing multi-node tests with continuous ingest, I was watching the ingest performance on the peer via the monitor.
I noticed that the ingest rate had a regular pattern to it, where ingest would spike, and then regularly decrease by a (mostly) fixed interval, flat-line, and then repeat.
I believe each cycle on the ingest graph is the replication of a file from the primary. The reduction in throughput is relative to the amount of time it takes to re-read the "prefix" of the file which we already replicated. I need to push some more logic down into the AccumuloReplicaSystem so that we can avoid that growing penalty for seeking over the data which we don't need to re-process.
The cost is that it pushes more complexity into the AccumuloReplicaSystem, but, I imagine that after I write an implementation to replicate to some other system, it would become more obvious where the common points live that can be abstracted into a common base class.