Uploaded image for project: 'Accumulo'
  1. Accumulo
  2. ACCUMULO-3818

replication should quit if the destination table does not exist

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Won't Fix
    • Affects Version/s: 1.7.0
    • Fix Version/s: None
    • Component/s: master, tserver
    • Labels:

      Description

      Restarting a replication Random Walk test, I noticed huge numbers of ERRORS:

      RemoteReplicationException(code:TABLE_DOES_NOT_EXIST, reason:Table with id 3 does not exist).

      Replication should quit and give up when the destination or source are deleted.

        Activity

        Hide
        elserj Josh Elser added a comment -

        Replication should quit and give up when the destination or source are deleted

        What do you mean by "quit"? Exit the tabletserver/master? Delete the records for the source+dest?

        I think these are trickier than they appear at first glance.

        Consider the case when a source table is deleted, what if there are pending files to be replicated that a user did not realize. They thought the table was fully replicated elsewhere (where the table is desired) and they are trying to clean up the table locally (where it is now unneeded). We would want to make sure that we still replicate the data even if the source table no longer exists. There may be some issues because the configuration required to replicate the table is now missing. Perhaps this can be mitigated with better documentation/tools on when a table is fully replicated.

        If a destination doesn't exist, this could indicate misconfiguration. This may be transient (especially in the case if the user tried to update the configuration in ZK but the master didn't notice it when it tried to run replication). I think we would want to be careful in how we fail in this case as it can be corrected by user configuration.

        Show
        elserj Josh Elser added a comment - Replication should quit and give up when the destination or source are deleted What do you mean by "quit"? Exit the tabletserver/master? Delete the records for the source+dest? I think these are trickier than they appear at first glance. Consider the case when a source table is deleted, what if there are pending files to be replicated that a user did not realize. They thought the table was fully replicated elsewhere (where the table is desired) and they are trying to clean up the table locally (where it is now unneeded). We would want to make sure that we still replicate the data even if the source table no longer exists. There may be some issues because the configuration required to replicate the table is now missing. Perhaps this can be mitigated with better documentation/tools on when a table is fully replicated. If a destination doesn't exist, this could indicate misconfiguration. This may be transient (especially in the case if the user tried to update the configuration in ZK but the master didn't notice it when it tried to run replication). I think we would want to be careful in how we fail in this case as it can be corrected by user configuration.
        Hide
        mdrob Mike Drob added a comment -

        Bumping this out of 1.7 entirely, as it doesn't seem clear to me that we have a plan for a solution.

        Show
        mdrob Mike Drob added a comment - Bumping this out of 1.7 entirely, as it doesn't seem clear to me that we have a plan for a solution.
        Hide
        ctubbsii Christopher Tubbs added a comment -

        Closing this as "Won't Fix" due to Josh Elser's comments. It doesn't appear that there's a clear and appropriate solution for a production use case.

        Perhaps we could improve Random Walk tests to prevent these errors? A new, more narrowly scoped ticket could address that.

        Show
        ctubbsii Christopher Tubbs added a comment - Closing this as "Won't Fix" due to Josh Elser 's comments. It doesn't appear that there's a clear and appropriate solution for a production use case. Perhaps we could improve Random Walk tests to prevent these errors? A new, more narrowly scoped ticket could address that.
        Hide
        elserj Josh Elser added a comment -

        Perhaps we could improve Random Walk tests to prevent these errors? A new, more narrowly scoped ticket could address that.

        +1

        It doesn't appear that there's a clear and appropriate solution for a production use case.

        We might be able to do something to make this fail "better", but we'd have to apply a bit of thought to actually understand the problems, what should be retried (to avoid data loss), etc.

        Show
        elserj Josh Elser added a comment - Perhaps we could improve Random Walk tests to prevent these errors? A new, more narrowly scoped ticket could address that. +1 It doesn't appear that there's a clear and appropriate solution for a production use case. We might be able to do something to make this fail "better", but we'd have to apply a bit of thought to actually understand the problems, what should be retried (to avoid data loss), etc.

          People

          • Assignee:
            Unassigned
            Reporter:
            ecn Eric Newton
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development