Accumulo
  1. Accumulo
  2. ACCUMULO-2580

Modify "bulk ingest" code path to create replication entries

    Details

    • Type: New Feature New Feature
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: 1.8.0
    • Component/s: replication
    • Labels:
      None

      Description

      When a file import is requested, we also need to create a key to replicate so we know to replicate this file and not allow the GC to delete it.

        Activity

        Hide
        Christopher Tubbs added a comment -

        One solution might be for tservers that write ~del entries to write ~repl entries instead when they detect that replication is enabled, and for the replication system to write the ~del entries when done replicating.

        Show
        Christopher Tubbs added a comment - One solution might be for tservers that write ~del entries to write ~repl entries instead when they detect that replication is enabled, and for the replication system to write the ~del entries when done replicating.
        Hide
        Josh Elser added a comment -

        Yup, that would likely work, but you'd incur a big delay (replication would only start when the tserver wants to delete a file). I was thinking that it would be to add the ~repl record after the file is brought online (perhaps as a extra FATE op after the existing BulkImport op completes).

        Show
        Josh Elser added a comment - Yup, that would likely work, but you'd incur a big delay (replication would only start when the tserver wants to delete a file). I was thinking that it would be to add the ~repl record after the file is brought online (perhaps as a extra FATE op after the existing BulkImport op completes).
        Hide
        Christopher Tubbs added a comment -

        Oh right, that makes sense. I guess the option, then, is to either skip creating ~del entries (rely on replication code to create them), or do additional checks in the gc to look for ~repl entries.

        Show
        Christopher Tubbs added a comment - Oh right, that makes sense. I guess the option, then, is to either skip creating ~del entries (rely on replication code to create them), or do additional checks in the gc to look for ~repl entries.
        Hide
        Josh Elser added a comment -

        do additional checks in the gc to look for ~repl entries.

        This is what I did for the normal "live" ingest path. Since the GC is running "out of band" anyways and the size of the replication is relatively small (the one locality group you would actually need to read), I don't think that's a terrible thing to do.

        Show
        Josh Elser added a comment - do additional checks in the gc to look for ~repl entries. This is what I did for the normal "live" ingest path. Since the GC is running "out of band" anyways and the size of the replication is relatively small (the one locality group you would actually need to read), I don't think that's a terrible thing to do.

          People

          • Assignee:
            Unassigned
            Reporter:
            Josh Elser
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:

              Development