Uploaded image for project: 'Accumulo'
  1. Accumulo
  2. ACCUMULO-378 Multi data center replication
  3. ACCUMULO-2846

Need to re-use DataInputStream for reading files that need replication

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 1.7.0
    • replication
    • None

    Description

      In doing multi-node tests with continuous ingest, I was watching the ingest performance on the peer via the monitor.

      I noticed that the ingest rate had a regular pattern to it, where ingest would spike, and then regularly decrease by a (mostly) fixed interval, flat-line, and then repeat.

      I believe each cycle on the ingest graph is the replication of a file from the primary. The reduction in throughput is relative to the amount of time it takes to re-read the "prefix" of the file which we already replicated. I need to push some more logic down into the AccumuloReplicaSystem so that we can avoid that growing penalty for seeking over the data which we don't need to re-process.

      The cost is that it pushes more complexity into the AccumuloReplicaSystem, but, I imagine that after I write an implementation to replicate to some other system, it would become more obvious where the common points live that can be abstracted into a common base class.

      Attachments

        1. patched-ingest-graph.jpg
          49 kB
          Josh Elser
        2. ingest-graph.jpg
          54 kB
          Josh Elser

        Activity

          People

            elserj Josh Elser
            elserj Josh Elser
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: