Accumulo
  1. Accumulo
  2. ACCUMULO-2949

Write explicit "close" markers for WALs

    Details

    • Type: Improvement Improvement
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: logger, replication
    • Labels:
      None

      Description

      To ensure that WALs are not left in a dangling "open" state WRT replication, the garbage collector scans the tablets and constructs a view of WALs that are currently in use. It consults that view to determine which WALs can move to a "closed" replication state.

      This isn't entirely correct because a WAL can "come back" again after being removed from a Tablet. Consider the following:

      1. Table has one tablet hosted on one tserver
      2. Tablet gets some mutations
      3. Tablet gets MinC
      4. Tablet removes WAL entry as part of MinC
      5. WAL is "closed" WRT replication
      6. Tablet receives more mutations, starts using the same WAL

      There are a couple of ways that this could present itself, each of which would result in re-replication of data we've potentially already sent once. On an active system, I don't think this is of big concern, and we already don't guarantee a "once and only once" replication contract so this isn't critical. The combiner set on the replication table will also mitigate most of the re-replication concerns as those records persist until the entire file is replicated (which should outlast the use on the local system).

      Eric Newton recommended that we could record a "closed" marker for a WAL as a part of TabletServerLogger.close() which would prevent the need to "guess" at when a WAL will no longer be used.

      If we want to move to explicit "end" tracking (see ACCUMULO-2835), we will need this implemented.

        Issue Links

          Activity

          Josh Elser created issue -
          Josh Elser made changes -
          Field Original Value New Value
          Link This issue blocks ACCUMULO-2835 [ ACCUMULO-2835 ]
          Josh Elser made changes -
          Description To ensure that WALs are not left in a dangling "open" state WRT replication, the garbage collector scans the tablets and constructs a view of WALs that are currently in use. It consults that view to determine which WALs can move to a "closed" replication state.

          This isn't entirely correct because a WAL can "come back" again after being removed from a Tablet. Consider the following:

          # Table has one tablet hosted on one tserver
          # Tablet gets some mutations
          # Tablet gets MinC
          # Tablet removes WAL entry as part of MinC
          # WAL is "closed" WRT replication
          # Tablet receives more mutations, starts using the same WAL

          There are a couple of ways that this could present itself, each of which would result in re-replication of data we've potentially already sent once. On an active system, I don't think this is of big concern, and we already don't guarantee a "once and only once" replication contract so this isn't critical.

          If we want to move to explicit "end" tracking (see ACCUMULO-2835), we will need this implemented.
          To ensure that WALs are not left in a dangling "open" state WRT replication, the garbage collector scans the tablets and constructs a view of WALs that are currently in use. It consults that view to determine which WALs can move to a "closed" replication state.

          This isn't entirely correct because a WAL can "come back" again after being removed from a Tablet. Consider the following:

          # Table has one tablet hosted on one tserver
          # Tablet gets some mutations
          # Tablet gets MinC
          # Tablet removes WAL entry as part of MinC
          # WAL is "closed" WRT replication
          # Tablet receives more mutations, starts using the same WAL

          There are a couple of ways that this could present itself, each of which would result in re-replication of data we've potentially already sent once. On an active system, I don't think this is of big concern, and we already don't guarantee a "once and only once" replication contract so this isn't critical. The combiner set on the replication table will also mitigate most of the re-replication concerns as those records persist until the entire file is replicated (which should outlast the use on the local system).

          [~ecn] recommended that we could record a "closed" marker for a WAL as a part of {{TabletServerLogger.close()}} which would prevent the need to "guess" at when a WAL will no longer be used.

          If we want to move to explicit "end" tracking (see ACCUMULO-2835), we will need this implemented.
          Josh Elser made changes -
          Assignee Josh Elser [ elserj ]

            People

            • Assignee:
              Unassigned
              Reporter:
              Josh Elser
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:

                Development