[ACCUMULO-2846] Need to re-use DataInputStream for reading files that need replication - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Sub-task
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 1.7.0
Component/s: replication
Labels:
None

Description

In doing multi-node tests with continuous ingest, I was watching the ingest performance on the peer via the monitor.

I noticed that the ingest rate had a regular pattern to it, where ingest would spike, and then regularly decrease by a (mostly) fixed interval, flat-line, and then repeat.

I believe each cycle on the ingest graph is the replication of a file from the primary. The reduction in throughput is relative to the amount of time it takes to re-read the "prefix" of the file which we already replicated. I need to push some more logic down into the AccumuloReplicaSystem so that we can avoid that growing penalty for seeking over the data which we don't need to re-process.

The cost is that it pushes more complexity into the AccumuloReplicaSystem, but, I imagine that after I write an implementation to replicate to some other system, it would become more obvious where the common points live that can be abstracted into a common base class.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

ingest-graph.jpg
27/May/14 19:12
54 kB
Josh Elser
patched-ingest-graph.jpg
27/May/14 19:30
49 kB
Josh Elser

Activity

People

Assignee:: Josh Elser

Reporter:: Josh Elser

Votes:: 0 Vote for this issue

Watchers:: 0 Start watching this issue

Dates

Created:: 27/May/14 18:14

Updated:: 28/May/14 16:45

Resolved:: 28/May/14 16:45